[GH-ISSUE #15453] Ollama Cloud Pro: 95% failure rate across all cloud models — service is unusable #56391

Open
opened 2026-04-29 10:45:34 -05:00 by GiteaMirror · 26 comments
Owner

Originally created by @KUANKEI21 on GitHub (Apr 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15453

Originally assigned to: @jmorganca on GitHub.

Environment

  • Plan: Ollama Pro ($20/month, subscribed 2026-04-09)
  • OS: macOS (Darwin 25.4.0)
  • Ollama version: Latest (via brew)
  • Connection: Stable internet, 0% packet loss to ollama.com
  • Models tested: glm-5.1:cloud, kimi-k2.5:cloud, qwen3.5:cloud, deepseek-v3.2:cloud

Problem

Ollama Cloud is effectively unusable. Both /api/chat and /api/generate endpoints return empty responses or timeout for all cloud models. This is not model-specific — every single cloud model exhibits the same behavior.

Reproduction

Simple test — 5 sequential requests per model, 20-second timeout:

for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    curl -s --max-time 20 http://localhost:11434/api/chat \
      -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}" \
      | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{'"$i"'} OK {d[\"total_duration\"]/1e9:.1f}s')" 2>/dev/null \
      || echo "#$i FAIL (empty/timeout)"
  done
done

Results (2026-04-09, ~21:00 UTC+8)

Model Success Rate Notes
glm-5.1:cloud 0/5 All empty/timeout
kimi-k2.5:cloud 1/5 1 success (2.6s), 4 failures
qwen3.5:cloud 0/5 All empty/timeout
deepseek-v3.2:cloud 0/5 All empty/timeout
Total 1/20 (5%)

Earlier in the day, glm-5.1:cloud worked intermittently (2/3 success), so this appears to be a degrading situation.

Both endpoints affected

Tested /api/generate as well — same 0/5 failure rate for glm-5.1:cloud. This rules out a /api/chat-specific bug.

Expected behavior

As a paying Pro subscriber ($20/month), I expect a reasonable success rate (>95%) for cloud model inference. A 5% success rate is not a degraded service — it is a broken service.

What I've ruled out

  • Local Ollama service is running (localhost:11434 responds, ollama list shows all cloud models)
  • Network is stable (non-cloud local models work fine)
  • Not a single-model issue (all 4 cloud models fail)
  • Not an endpoint issue (/api/chat and /api/generate both fail)
  • Tested with minimal payloads ("hi") — not a token limit issue

This aligns with multiple existing reports:

  • #15419 — Frequent 503 errors on cloud models (2026-04-08, 7+ confirmations)
  • #14673 — 29.7% failure rate documented, support tickets ignored 2+ weeks
  • #15290 — EOF errors and socket closures on cloud models

Requests

  1. Acknowledge the outage — There is no status page, no incident communication, and no response on existing issues
  2. Provide a status page for Ollama Cloud service health
  3. Add Retry-After headers on 503/502 responses so clients can implement proper backoff
  4. Consider pro-rating or extending subscriptions for periods of sustained outage — charging $20/month for a 5% success rate is not acceptable
Originally created by @KUANKEI21 on GitHub (Apr 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15453 Originally assigned to: @jmorganca on GitHub. ## Environment - **Plan**: Ollama Pro ($20/month, subscribed 2026-04-09) - **OS**: macOS (Darwin 25.4.0) - **Ollama version**: Latest (via `brew`) - **Connection**: Stable internet, 0% packet loss to ollama.com - **Models tested**: `glm-5.1:cloud`, `kimi-k2.5:cloud`, `qwen3.5:cloud`, `deepseek-v3.2:cloud` ## Problem Ollama Cloud is effectively unusable. Both `/api/chat` and `/api/generate` endpoints return **empty responses or timeout** for all cloud models. This is not model-specific — every single cloud model exhibits the same behavior. ## Reproduction Simple test — 5 sequential requests per model, 20-second timeout: ```bash for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do echo "=== $model ===" for i in 1 2 3 4 5; do curl -s --max-time 20 http://localhost:11434/api/chat \ -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}" \ | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{'"$i"'} OK {d[\"total_duration\"]/1e9:.1f}s')" 2>/dev/null \ || echo "#$i FAIL (empty/timeout)" done done ``` ### Results (2026-04-09, ~21:00 UTC+8) | Model | Success Rate | Notes | |-------|-------------|-------| | `glm-5.1:cloud` | **0/5** | All empty/timeout | | `kimi-k2.5:cloud` | **1/5** | 1 success (2.6s), 4 failures | | `qwen3.5:cloud` | **0/5** | All empty/timeout | | `deepseek-v3.2:cloud` | **0/5** | All empty/timeout | | **Total** | **1/20 (5%)** | | Earlier in the day, `glm-5.1:cloud` worked intermittently (2/3 success), so this appears to be a degrading situation. ### Both endpoints affected Tested `/api/generate` as well — same 0/5 failure rate for `glm-5.1:cloud`. This rules out a `/api/chat`-specific bug. ## Expected behavior As a paying Pro subscriber ($20/month), I expect a reasonable success rate (>95%) for cloud model inference. A **5% success rate** is not a degraded service — it is a broken service. ## What I've ruled out - ✅ Local Ollama service is running (`localhost:11434` responds, `ollama list` shows all cloud models) - ✅ Network is stable (non-cloud local models work fine) - ✅ Not a single-model issue (all 4 cloud models fail) - ✅ Not an endpoint issue (`/api/chat` and `/api/generate` both fail) - ✅ Tested with minimal payloads (`"hi"`) — not a token limit issue ## Related issues This aligns with multiple existing reports: - #15419 — Frequent 503 errors on cloud models (2026-04-08, 7+ confirmations) - #14673 — 29.7% failure rate documented, support tickets ignored 2+ weeks - #15290 — EOF errors and socket closures on cloud models ## Requests 1. **Acknowledge the outage** — There is no status page, no incident communication, and no response on existing issues 2. **Provide a status page** for Ollama Cloud service health 3. **Add `Retry-After` headers** on 503/502 responses so clients can implement proper backoff 4. **Consider pro-rating or extending subscriptions** for periods of sustained outage — charging $20/month for a 5% success rate is not acceptable
GiteaMirror added the cloud label 2026-04-29 10:45:34 -05:00
Author
Owner

@bartlomiejwolk commented on GitHub (Apr 9, 2026):

Same for me. I'm a new Ollama Pro user and I started thinking that this is the normal. I'm glad it's not.

<!-- gh-comment-id:4215448317 --> @bartlomiejwolk commented on GitHub (Apr 9, 2026): Same for me. I'm a new Ollama Pro user and I started thinking that this is the normal. I'm glad it's not.
Author
Owner

@dongluochen commented on GitHub (Apr 10, 2026):

@KUANKEI721 thanks for reporting the issues. Sorry about the experiences. Can you re-run your requests and provide the time and request ids (like below) for us to investigate.

Internal Server Error (ref: 4f8b6a2c-a0ec-474e-b37a-6542b4ea732e)
<!-- gh-comment-id:4218782831 --> @dongluochen commented on GitHub (Apr 10, 2026): @KUANKEI721 thanks for reporting the issues. Sorry about the experiences. Can you re-run your requests and provide the time and request ids (like below) for us to investigate. ``` Internal Server Error (ref: 4f8b6a2c-a0ec-474e-b37a-6542b4ea732e) ```
Author
Owner

@jmorganca commented on GitHub (Apr 10, 2026):

Hi all I'm sorry for the issues with Ollama's cloud this morning. We've been working hard to increase capacity. It should be improving now and we'll continue to monitor it.

<!-- gh-comment-id:4218854979 --> @jmorganca commented on GitHub (Apr 10, 2026): Hi all I'm sorry for the issues with Ollama's cloud this morning. We've been working hard to increase capacity. It should be improving now and we'll continue to monitor it.
Author
Owner

@KUANKEI21 commented on GitHub (Apr 10, 2026):

@dongluochen Thanks for the quick response and for looking into this.

We don't have a ref because these failures are not 500 Internal Server Error — they are 502 Bad Gateway with an empty response body, so no app-level error UUID is returned.

Per your own API error documentation, 502 is defined as:

502: Bad Gateway (e.g. when a cloud model cannot be reached)

This is a server-side cloud routing issue, not a client-side problem. The requests never reached the application layer that generates ref UUIDs — they failed at the gateway.


Representative failed samples

From our local ~/.ollama/logs/server.log (all times UTC+8, all requests were sequential — one at a time, no concurrency):

# Time (UTC+8) Endpoint Status Latency
1 2026-04-09 17:29:14 /v1/chat/completions 502 5m0s
2 2026-04-09 21:35:21 /api/chat 502 20.0s
3 2026-04-09 22:09:35 /api/chat 502 20.0s
4 2026-04-09 22:10:44 /api/generate 502 2m0s
5 2026-04-10 02:11:42 /v1/chat/completions 502 5.0s
Full list: 48 total 502s (39 on 04-09, 9 on 04-10)
[GIN] 2026/04/09 - 17:29:14 | 502 |          5m0s | POST "/v1/chat/completions"
[GIN] 2026/04/09 - 17:34:16 | 502 |          5m0s | POST "/v1/chat/completions"
[GIN] 2026/04/09 - 21:10:48 | 502 |  30.00136325s | POST "/api/chat"
[GIN] 2026/04/09 - 21:35:21 | 502 | 20.004127292s | POST "/api/chat"
[GIN] 2026/04/09 - 21:35:41 | 502 | 20.002606875s | POST "/api/chat"
[GIN] 2026/04/09 - 21:35:48 | 502 |  7.249221167s | POST "/api/chat"
[GIN] 2026/04/09 - 21:45:08 | 502 |  20.00288025s | POST "/api/chat"
[GIN] 2026/04/09 - 21:46:14 | 502 | 20.002244458s | POST "/api/chat"
[GIN] 2026/04/09 - 21:47:23 | 502 | 20.001394208s | POST "/api/chat"
[GIN] 2026/04/09 - 21:48:05 | 502 | 20.002511625s | POST "/api/chat"
[GIN] 2026/04/09 - 22:08:35 | 502 | 20.002653625s | POST "/api/generate"
[GIN] 2026/04/09 - 22:08:55 | 502 | 20.002629833s | POST "/api/generate"
[GIN] 2026/04/09 - 22:09:15 | 502 |    20.002901s | POST "/api/generate"
[GIN] 2026/04/09 - 22:09:35 | 502 |  20.00160475s | POST "/api/chat"
[GIN] 2026/04/09 - 22:09:55 | 502 | 20.002261209s | POST "/api/chat"
[GIN] 2026/04/09 - 22:10:15 | 502 | 20.002600167s | POST "/api/chat"
[GIN] 2026/04/09 - 22:10:35 | 502 | 20.001014458s | POST "/api/chat"
[GIN] 2026/04/09 - 22:10:44 | 502 |          2m0s | POST "/api/generate" (×9)
[GIN] 2026/04/09 - 22:10:55 | 502 |   20.0013255s | POST "/api/chat"
[GIN] 2026/04/09 - 22:12:44 | 502 |          2m0s | POST "/api/generate" (×9)
[GIN] 2026/04/09 - 22:15:59 | 502 |          3m0s | POST "/api/generate"
[GIN] 2026/04/09 - 22:17:05 | 502 |          1m0s | POST "/api/generate"
[GIN] 2026/04/09 - 22:17:35 | 502 | 15.002056084s | POST "/api/generate"
[GIN] 2026/04/10 - 02:11:42 | 502 |  4.988163084s | POST "/v1/chat/completions"
[GIN] 2026/04/10 - 13:13:40 | 502 | 20.003288167s | POST "/api/chat"
[GIN] 2026/04/10 - 13:14:10 | 502 | 20.002086333s | POST "/api/chat"
[GIN] 2026/04/10 - 13:14:37 | 502 | 20.002178709s | POST "/api/chat"
[GIN] 2026/04/10 - 13:17:58 | 502 | 20.001516875s | POST "/api/chat"
[GIN] 2026/04/10 - 13:19:36 | 502 | 20.002368333s | POST "/api/chat"
[GIN] 2026/04/10 - 13:22:20 | 502 | 20.003383167s | POST "/api/chat"
[GIN] 2026/04/10 - 13:24:03 | 502 | 20.003107333s | POST "/api/chat"
[GIN] 2026/04/10 - 13:24:23 | 502 | 20.003951959s | POST "/api/chat"

What the 502 responses look like vs. successful ones

Successful requests return rich tracing headers:

Server: Google Frontend
X-Request-Id: 39a39f87-d82e-45dd-8408-0d95692b876e
Traceparent: 00-86d89d6bfb1481d41f5799ef8470d8b6-a91d7a5e4b265d16-00
X-Cloud-Trace-Context: 86d89d6bfb1481d41f5799ef8470d8b6/12186030712140750102

The 502 failures return empty body, no headers — there is nothing client-side to provide beyond the timestamps above.


As of 2026-04-10 ~05:30 UTC, the issue is no longer reproducing after the capacity improvements @jmorganca mentioned. This is consistent with a transient capacity/routing incident on the cloud backend.

Could you investigate the 502s using these UTC+8 timestamps against your edge/gateway logs? Happy to provide our account email privately if that helps correlate.

<!-- gh-comment-id:4221506658 --> @KUANKEI21 commented on GitHub (Apr 10, 2026): @dongluochen Thanks for the quick response and for looking into this. We don't have a `ref` because these failures are **not** `500 Internal Server Error` — they are `502 Bad Gateway` with an **empty response body**, so no app-level error UUID is returned. Per your own [API error documentation](https://github.com/ollama/ollama/blob/main/docs/api/errors.mdx), `502` is defined as: > **502**: Bad Gateway (e.g. when a cloud model cannot be reached) This is a server-side cloud routing issue, not a client-side problem. The requests never reached the application layer that generates `ref` UUIDs — they failed at the gateway. --- ### Representative failed samples From our local `~/.ollama/logs/server.log` (all times UTC+8, **all requests were sequential — one at a time, no concurrency**): | # | Time (UTC+8) | Endpoint | Status | Latency | |---|-------------|----------|--------|---------| | 1 | 2026-04-09 17:29:14 | `/v1/chat/completions` | 502 | 5m0s | | 2 | 2026-04-09 21:35:21 | `/api/chat` | 502 | 20.0s | | 3 | 2026-04-09 22:09:35 | `/api/chat` | 502 | 20.0s | | 4 | 2026-04-09 22:10:44 | `/api/generate` | 502 | 2m0s | | 5 | 2026-04-10 02:11:42 | `/v1/chat/completions` | 502 | 5.0s | <details> <summary>Full list: 48 total 502s (39 on 04-09, 9 on 04-10)</summary> ``` [GIN] 2026/04/09 - 17:29:14 | 502 | 5m0s | POST "/v1/chat/completions" [GIN] 2026/04/09 - 17:34:16 | 502 | 5m0s | POST "/v1/chat/completions" [GIN] 2026/04/09 - 21:10:48 | 502 | 30.00136325s | POST "/api/chat" [GIN] 2026/04/09 - 21:35:21 | 502 | 20.004127292s | POST "/api/chat" [GIN] 2026/04/09 - 21:35:41 | 502 | 20.002606875s | POST "/api/chat" [GIN] 2026/04/09 - 21:35:48 | 502 | 7.249221167s | POST "/api/chat" [GIN] 2026/04/09 - 21:45:08 | 502 | 20.00288025s | POST "/api/chat" [GIN] 2026/04/09 - 21:46:14 | 502 | 20.002244458s | POST "/api/chat" [GIN] 2026/04/09 - 21:47:23 | 502 | 20.001394208s | POST "/api/chat" [GIN] 2026/04/09 - 21:48:05 | 502 | 20.002511625s | POST "/api/chat" [GIN] 2026/04/09 - 22:08:35 | 502 | 20.002653625s | POST "/api/generate" [GIN] 2026/04/09 - 22:08:55 | 502 | 20.002629833s | POST "/api/generate" [GIN] 2026/04/09 - 22:09:15 | 502 | 20.002901s | POST "/api/generate" [GIN] 2026/04/09 - 22:09:35 | 502 | 20.00160475s | POST "/api/chat" [GIN] 2026/04/09 - 22:09:55 | 502 | 20.002261209s | POST "/api/chat" [GIN] 2026/04/09 - 22:10:15 | 502 | 20.002600167s | POST "/api/chat" [GIN] 2026/04/09 - 22:10:35 | 502 | 20.001014458s | POST "/api/chat" [GIN] 2026/04/09 - 22:10:44 | 502 | 2m0s | POST "/api/generate" (×9) [GIN] 2026/04/09 - 22:10:55 | 502 | 20.0013255s | POST "/api/chat" [GIN] 2026/04/09 - 22:12:44 | 502 | 2m0s | POST "/api/generate" (×9) [GIN] 2026/04/09 - 22:15:59 | 502 | 3m0s | POST "/api/generate" [GIN] 2026/04/09 - 22:17:05 | 502 | 1m0s | POST "/api/generate" [GIN] 2026/04/09 - 22:17:35 | 502 | 15.002056084s | POST "/api/generate" [GIN] 2026/04/10 - 02:11:42 | 502 | 4.988163084s | POST "/v1/chat/completions" [GIN] 2026/04/10 - 13:13:40 | 502 | 20.003288167s | POST "/api/chat" [GIN] 2026/04/10 - 13:14:10 | 502 | 20.002086333s | POST "/api/chat" [GIN] 2026/04/10 - 13:14:37 | 502 | 20.002178709s | POST "/api/chat" [GIN] 2026/04/10 - 13:17:58 | 502 | 20.001516875s | POST "/api/chat" [GIN] 2026/04/10 - 13:19:36 | 502 | 20.002368333s | POST "/api/chat" [GIN] 2026/04/10 - 13:22:20 | 502 | 20.003383167s | POST "/api/chat" [GIN] 2026/04/10 - 13:24:03 | 502 | 20.003107333s | POST "/api/chat" [GIN] 2026/04/10 - 13:24:23 | 502 | 20.003951959s | POST "/api/chat" ``` </details> ### What the 502 responses look like vs. successful ones Successful requests return rich tracing headers: ``` Server: Google Frontend X-Request-Id: 39a39f87-d82e-45dd-8408-0d95692b876e Traceparent: 00-86d89d6bfb1481d41f5799ef8470d8b6-a91d7a5e4b265d16-00 X-Cloud-Trace-Context: 86d89d6bfb1481d41f5799ef8470d8b6/12186030712140750102 ``` The 502 failures return **empty body, no headers** — there is nothing client-side to provide beyond the timestamps above. --- As of 2026-04-10 ~05:30 UTC, the issue is no longer reproducing after the capacity improvements @jmorganca mentioned. This is consistent with a transient capacity/routing incident on the cloud backend. Could you investigate the 502s using these UTC+8 timestamps against your edge/gateway logs? Happy to provide our account email privately if that helps correlate.
Author
Owner

@matholland618 commented on GitHub (Apr 11, 2026):

same for me....I noticed this yesterday, maybe late wed. evening. I noticed it when I changed my hermes model to glm 5.1 cloud...thought it was an issue with that model, it would just freeze up during tasks, or not respond at all....then I went back to qwen 3.5, and it's doing the same thing..

<!-- gh-comment-id:4228298203 --> @matholland618 commented on GitHub (Apr 11, 2026): same for me....I noticed this yesterday, maybe late wed. evening. I noticed it when I changed my hermes model to glm 5.1 cloud...thought it was an issue with that model, it would just freeze up during tasks, or not respond at all....then I went back to qwen 3.5, and it's doing the same thing..
Author
Owner

@orrinwitt commented on GitHub (Apr 12, 2026):

I just switched my nanobot using glm-5.1 from ollama cloud to openrouter and still got the error. maybe it's the upstream providers?

<!-- gh-comment-id:4232639237 --> @orrinwitt commented on GitHub (Apr 12, 2026): I just switched my nanobot using glm-5.1 from ollama cloud to openrouter and still got the error. maybe it's the upstream providers?
Author
Owner

@stanleyma610 commented on GitHub (Apr 13, 2026):

same for me, all ollama cloud models got 502

<!-- gh-comment-id:4234863441 --> @stanleyma610 commented on GitHub (Apr 13, 2026): same for me, all ollama cloud models got 502
Author
Owner

@coleman399 commented on GitHub (Apr 13, 2026):

same for me

<!-- gh-comment-id:4238674266 --> @coleman399 commented on GitHub (Apr 13, 2026): same for me
Author
Owner

@jackluo923 commented on GitHub (Apr 13, 2026):

same for me

<!-- gh-comment-id:4239358611 --> @jackluo923 commented on GitHub (Apr 13, 2026): same for me
Author
Owner

@dongluochen commented on GitHub (Apr 13, 2026):

@KUANKEI721 thanks a lot for the detailed info. It's helpful! It looks you use ollama-cloud through local ollama which doesn't log request ids.

Looking at your table, I guess you may use a client with timeouts. What models do you use? The 5s and 20s are relatively short in LLM world. Requests may take more than 20s to complete, especially for large model and large prompts. If you set a short timeout some of the requests would fail even the backend is responding.

# Time (UTC+8) Endpoint Status Latency
1 2026-04-09 17:29:14 /v1/chat/completions 502 5m0s
2 2026-04-09 21:35:21 /api/chat 502 20.0s
3 2026-04-09 22:09:35 /api/chat 502 20.0s
4 2026-04-09 22:10:44 /api/generate 502 2m0s
5 2026-04-10 02:11:42 /v1/chat/completions 502 5.0s

If you continue to see failures, it'd be great you can run some tests with curl and share the responses, e.g.,

curl -s -X POST "https://ollama.com/api/chat" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5",
    "messages": [{"role": "user", "content": "What is the weather in Tokyo and Paris?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "stream": false
  }'
<!-- gh-comment-id:4239864638 --> @dongluochen commented on GitHub (Apr 13, 2026): @KUANKEI721 thanks a lot for the detailed info. It's helpful! It looks you use ollama-cloud through local ollama which doesn't log request ids. Looking at your table, I guess you may use a client with timeouts. What models do you use? The 5s and 20s are relatively short in LLM world. Requests may take more than 20s to complete, especially for large model and large prompts. If you set a short timeout some of the requests would fail even the backend is responding. | # | Time (UTC+8) | Endpoint | Status | Latency | |---|-------------|----------|--------|---------| | 1 | 2026-04-09 17:29:14 | `/v1/chat/completions` | 502 | 5m0s | | 2 | 2026-04-09 21:35:21 | `/api/chat` | 502 | 20.0s | | 3 | 2026-04-09 22:09:35 | `/api/chat` | 502 | 20.0s | | 4 | 2026-04-09 22:10:44 | `/api/generate` | 502 | 2m0s | | 5 | 2026-04-10 02:11:42 | `/v1/chat/completions` | 502 | 5.0s | If you continue to see failures, it'd be great you can run some tests with curl and share the responses, e.g., ``` curl -s -X POST "https://ollama.com/api/chat" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.5", "messages": [{"role": "user", "content": "What is the weather in Tokyo and Paris?"}], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } } ], "stream": false }' ```
Author
Owner

@GrigoriyNestsiarovich commented on GitHub (Apr 14, 2026):

Same for me
"Service Temporarily Unavailable (ref: e1a9b1fd-dd4b-4b03-83b6-e9daefce4b6b) (status code: 503)"

<!-- gh-comment-id:4242654794 --> @GrigoriyNestsiarovich commented on GitHub (Apr 14, 2026): Same for me "Service Temporarily Unavailable (ref: e1a9b1fd-dd4b-4b03-83b6-e9daefce4b6b) (status code: 503)"
Author
Owner

@dongluochen commented on GitHub (Apr 14, 2026):

@GrigoriyNestsiarovich thanks for providing the ref id. This request failed due to capacity constraints. At the top of each hour there are bursty traffic due to cron jobs. The request around 2:02am PT failed because of that. Retry a bit later might go through. Sorry about that. We are working to improve system performance.

<!-- gh-comment-id:4246750214 --> @dongluochen commented on GitHub (Apr 14, 2026): @GrigoriyNestsiarovich thanks for providing the ref id. This request failed due to capacity constraints. At the top of each hour there are bursty traffic due to cron jobs. The request around 2:02am PT failed because of that. Retry a bit later might go through. Sorry about that. We are working to improve system performance.
Author
Owner

@ghostmodel commented on GitHub (Apr 16, 2026):

I get unauthorised 403 , the 1st request goes thru but all other attempts after that error

<!-- gh-comment-id:4263263540 --> @ghostmodel commented on GitHub (Apr 16, 2026): I get unauthorised 403 , the 1st request goes thru but all other attempts after that error
Author
Owner

@dongluochen commented on GitHub (Apr 16, 2026):

@ghostmodel can you post the response you get? I need the "ref" to understand the case. Thanks.

<!-- gh-comment-id:4263709761 --> @dongluochen commented on GitHub (Apr 16, 2026): @ghostmodel can you post the response you get? I need the "ref" to understand the case. Thanks.
Author
Owner

@harmssam commented on GitHub (Apr 17, 2026):

Qwen3.5 is completely unusable at the moment.

API call failed after 3 retries: HTTP 500: Error code: 500 - {'error': 'Internal Server Error (ref: cf66a179-44a1-45c9-ab6e-53058a47feef)'}

<!-- gh-comment-id:4264467730 --> @harmssam commented on GitHub (Apr 17, 2026): Qwen3.5 is completely unusable at the moment. API call failed after 3 retries: HTTP 500: Error code: 500 - {'error': 'Internal Server Error (ref: cf66a179-44a1-45c9-ab6e-53058a47feef)'}
Author
Owner

@ghostmodel commented on GitHub (Apr 17, 2026):

Using Hermes from the command line 1st one succeeds 2nd fails

In the .env file
OLLAMA_API_KEY=(and my key)

model:
default: minimax-m2.7
provider: ollama-cloud
base_url: https://ollama.com/v1

● hi

Initializing agent...
Hey! How can I help you today?

● hi

⚠️ API call failed (attempt 1/3): APIConnectionError
🔌 Provider: ollama-cloud Model: minimax-m2.7
🌐 Endpoint: https://ollama.com/v1
📝 Error: Connection error.
Retrying in 2.977621885121035s (attempt 1/3)...

<!-- gh-comment-id:4266594959 --> @ghostmodel commented on GitHub (Apr 17, 2026): Using Hermes from the command line 1st one succeeds 2nd fails In the .env file OLLAMA_API_KEY=(and my key) model: default: minimax-m2.7 provider: ollama-cloud base_url: https://ollama.com/v1 ------ ● hi Initializing agent... Hey! How can I help you today? ● hi ⚠️ API call failed (attempt 1/3): APIConnectionError 🔌 Provider: ollama-cloud Model: minimax-m2.7 🌐 Endpoint: https://ollama.com/v1 📝 Error: Connection error. ⏳ Retrying in 2.977621885121035s (attempt 1/3)...
Author
Owner

@orrinwitt commented on GitHub (Apr 18, 2026):

I realize this problem has been going on a lot longer, but a look at this uptime chart from Openrouter does illustrate the bottleneck that the upstream providers are running into when a model gets in very high demand. This shows uptime for GLM-5.1 for April 17th, 2026.

Image

I still want Ollama to up their cloud game, starting with getting a better handle on the abuse of the free tier, but some of this might be otherwise out of their control unless they're really running their own datacenters.

<!-- gh-comment-id:4272142035 --> @orrinwitt commented on GitHub (Apr 18, 2026): I realize this problem has been going on a lot longer, but a look at this uptime chart from Openrouter does illustrate the bottleneck that the upstream providers are running into when a model gets in very high demand. This shows uptime for GLM-5.1 for April 17th, 2026. <img width="594" height="403" alt="Image" src="https://github.com/user-attachments/assets/5e1dec36-bf4a-4dbe-b26b-5fea4968639b" /> I still want Ollama to up their cloud game, starting with getting a better handle on the abuse of the free tier, but some of this might be otherwise out of their control unless they're really running their own datacenters.
Author
Owner

@ehnwebmaster commented on GitHub (Apr 18, 2026):

Same here, I'm using free plan

Ollama API Cloud

Internal Server Error (ref: feef02ad-98d4-4f0c-b3aa-e95604640135

<!-- gh-comment-id:4273255806 --> @ehnwebmaster commented on GitHub (Apr 18, 2026): Same here, I'm using free plan Ollama API Cloud Internal Server Error (ref: feef02ad-98d4-4f0c-b3aa-e95604640135
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15453
Analyzed: 2026-04-18T18:21:28.393021

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274308348 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15453 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15453 **Analyzed**: 2026-04-18T18:21:28.393021 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Author
Owner

@HardStyleMoose commented on GitHub (Apr 21, 2026):

Please, and i mean it from bottom of my heart ♥, Just remove the free tier or dramatically lower it, As there are obviously created models to just create free accounts with VPN and run multiple sessions and when rate limited just fallback to making a new one over again, it is a pretty simple workflow to get self trained models to do, THAT HERE! i think that is the reason since i myself was thinking of this same method until i realized the abuse can be negative

<!-- gh-comment-id:4290401332 --> @HardStyleMoose commented on GitHub (Apr 21, 2026): Please, and i mean it from bottom of my heart ♥, Just remove the free tier or dramatically lower it, As there are obviously created models to just create free accounts with VPN and run multiple sessions and when rate limited just fallback to making a new one over again, it is a pretty simple workflow to get self trained models to do, THAT HERE! i think that is the reason since i myself was thinking of this same method until i realized the abuse can be negative
Author
Owner

@unicornboat commented on GitHub (Apr 22, 2026):

Same here:
{"error":"this model requires a subscription, upgrade for access: https://ollama.com/upgrade (ref: b45ce2fb-7e5e-4d4c-8ab4-5ec893930553)"}

<!-- gh-comment-id:4293388346 --> @unicornboat commented on GitHub (Apr 22, 2026): Same here: {"error":"this model requires a subscription, upgrade for access: https://ollama.com/upgrade (ref: b45ce2fb-7e5e-4d4c-8ab4-5ec893930553)"}
Author
Owner

@hasanur-rahman079 commented on GitHub (Apr 22, 2026):

I just upgraded to pro and the same stack issue and now totally unusable. this issue still not solved?

<!-- gh-comment-id:4299239219 --> @hasanur-rahman079 commented on GitHub (Apr 22, 2026): I just upgraded to pro and the same stack issue and now totally unusable. this issue still not solved?
Author
Owner

@natera commented on GitHub (Apr 24, 2026):

Same problem here using pro, none of the models works, any update?

<!-- gh-comment-id:4310861939 --> @natera commented on GitHub (Apr 24, 2026): Same problem here using pro, none of the models works, any update?
Author
Owner

@michael-conrad commented on GitHub (Apr 25, 2026):

Is this related to infinite open socket hangs?

I'm using OpenCode Desktop with Ollama Cloud / the $100-month plan

I'm repeatably getting hangs where I have to interrupt the agent then tell it to resume/continue

I tried setting the chunk timout, but that just causes a SSE response failure with no retry mechanism - so not effective - especially for autonomous type work

<!-- gh-comment-id:4320197356 --> @michael-conrad commented on GitHub (Apr 25, 2026): Is this related to infinite open socket hangs? I'm using OpenCode Desktop with Ollama Cloud / the $100-month plan I'm repeatably getting hangs where I have to interrupt the agent then tell it to resume/continue I tried setting the chunk timout, but that just causes a SSE response failure with no retry mechanism - so not effective - especially for autonomous type work
Author
Owner

@KayJay89 commented on GitHub (Apr 25, 2026):

Unfortunately I seem to be in the same boat (Pro plan):

🔀 preparing delegate_task…
[subagent-1] ⚠️ No response from provider for 180s (model: kimi-k2.6, context: ~45,862 tokens). Reconnecting...
[subagent-1] ⚠️ API call failed (attempt 1/3): APIConnectionError
[subagent-1] 🔌 Provider: ollama-cloud Model: kimi-k2.6
[subagent-1] 🌐 Endpoint: https://ollama.com/v1
[subagent-1] 📝 Error: Connection error.
[subagent-1] ⏱️ Elapsed: 241.89s Context: 18 msgs, ~45,863 tokens
[subagent-1] Retrying in 2.4s (attempt 1/3)...
[subagent-0] ⚠️ No response from provider for 180s (model: kimi-k2.6, context: ~47,474 tokens). Reconnecting...
[subagent-1] ⚠️ No response from provider for 180s (model: kimi-k2.6, context: ~45,862 tokens). Reconnecting...
[subagent-0] ⚠️ API call failed (attempt 1/3): APIConnectionError
[subagent-0] 🔌 Provider: ollama-cloud Model: kimi-k2.6
[subagent-0] 🌐 Endpoint: https://ollama.com/v1
[subagent-0] 📝 Error: Connection error.
[subagent-0] ⏱️ Elapsed: 241.79s Context: 15 msgs, ~47,475 tokens
[subagent-0] Retrying in 3.0s (attempt 1/3)...
[subagent-1] ⚠️ API call failed (attempt 2/3): APIConnectionError
[subagent-1] 🔌 Provider: ollama-cloud Model: kimi-k2.6
[subagent-1] 🌐 Endpoint: https://ollama.com/v1
[subagent-1] 📝 Error: Connection error.
[subagent-1] ⏱️ Elapsed: 486.36s Context: 18 msgs, ~45,863 tokens
[subagent-1] Retrying in 4.4s (attempt 2/3)...
✗ [1/3] Desk research: Find passive evidence tha (600.02s)
🔀 delegate 3 parallel tasks 600.6s [error]
[subagent-1] Interrupted during API call.
[subagent-0] Interrupted during API call.
✗ [3/3] Desk research: Verify who performs waste (600.02s)
✗ [2/3] Desk research: Find passive evidence tha (600.02s)
[subagent-2] Interrupt: cancelling 1 pending concurrent tool(s)

<!-- gh-comment-id:4320500684 --> @KayJay89 commented on GitHub (Apr 25, 2026): Unfortunately I seem to be in the same boat (Pro plan): ┊ 🔀 preparing delegate_task… [subagent-1] ⚠️ No response from provider for 180s (model: kimi-k2.6, context: ~45,862 tokens). Reconnecting... [subagent-1] ⚠️ API call failed (attempt 1/3): APIConnectionError [subagent-1] 🔌 Provider: ollama-cloud Model: kimi-k2.6 [subagent-1] 🌐 Endpoint: https://ollama.com/v1 [subagent-1] 📝 Error: Connection error. [subagent-1] ⏱️ Elapsed: 241.89s Context: 18 msgs, ~45,863 tokens [subagent-1] ⏳ Retrying in 2.4s (attempt 1/3)... [subagent-0] ⚠️ No response from provider for 180s (model: kimi-k2.6, context: ~47,474 tokens). Reconnecting... [subagent-1] ⚠️ No response from provider for 180s (model: kimi-k2.6, context: ~45,862 tokens). Reconnecting... [subagent-0] ⚠️ API call failed (attempt 1/3): APIConnectionError [subagent-0] 🔌 Provider: ollama-cloud Model: kimi-k2.6 [subagent-0] 🌐 Endpoint: https://ollama.com/v1 [subagent-0] 📝 Error: Connection error. [subagent-0] ⏱️ Elapsed: 241.79s Context: 15 msgs, ~47,475 tokens [subagent-0] ⏳ Retrying in 3.0s (attempt 1/3)... [subagent-1] ⚠️ API call failed (attempt 2/3): APIConnectionError [subagent-1] 🔌 Provider: ollama-cloud Model: kimi-k2.6 [subagent-1] 🌐 Endpoint: https://ollama.com/v1 [subagent-1] 📝 Error: Connection error. [subagent-1] ⏱️ Elapsed: 486.36s Context: 18 msgs, ~45,863 tokens [subagent-1] ⏳ Retrying in 4.4s (attempt 2/3)... ✗ [1/3] Desk research: Find passive evidence tha (600.02s) ┊ 🔀 delegate 3 parallel tasks 600.6s [error] [subagent-1] ⚡ Interrupted during API call. [subagent-0] ⚡ Interrupted during API call. ✗ [3/3] Desk research: Verify who performs waste (600.02s) ✗ [2/3] Desk research: Find passive evidence tha (600.02s) [subagent-2] ⚡ Interrupt: cancelling 1 pending concurrent tool(s)
Author
Owner

@el-analista commented on GitHub (Apr 27, 2026):

same here this is really bad

<!-- gh-comment-id:4323501604 --> @el-analista commented on GitHub (Apr 27, 2026): same here this is really bad
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56391