[GH-ISSUE #15419] Ollama Cloud: Frequent 503 errors making cloud models unreliable #71918

Open
opened 2026-05-05 02:56:47 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @oggixx on GitHub (Apr 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15419

Hey Ollama team,

I've been running autonomous agents using Ollama Cloud models and running into frequent 503 Service Unavailable errors. Thought I'd report this since it's making the cloud models pretty unreliable for production use.

What's happening

Requests to cloud models randomly fail with:

This happens with:

  • glm-5:cloud
  • glm-5.1:cloud
  • minimax-m2.7:cloud
  • kimi-k2.5:cloud

How often

Multiple times per hour. Sometimes a single request works, the next one fails with 503, then it works again. It's intermittent but frequent enough to break agent workflows.

Impact

  • Agent requests timeout mid-conversation
  • Cron jobs fail unexpectedly
  • Users experience broken interactions in Telegram/Discord bots

What I've tried

  • Retry logic with exponential backoff
  • Falling back to different models
  • None of it really helps when the 503s are this frequent

Setup

  • Ollama running in Docker
  • Using the model suffix to hit ollama.com:443
  • Fairly standard setup otherwise

Would be helpful

  1. Some visibility into when/why 503s happen (rate limiting vs capacity issues?)
  2. A retry-after header so we can back off properly
  3. Maybe a status page for Ollama Cloud uptime?

Thanks for the great work on Ollama! Happy to provide more details if needed.

Originally created by @oggixx on GitHub (Apr 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15419 Hey Ollama team, I've been running autonomous agents using Ollama Cloud models and running into frequent 503 Service Unavailable errors. Thought I'd report this since it's making the cloud models pretty unreliable for production use. ## What's happening Requests to cloud models randomly fail with: This happens with: - glm-5:cloud - glm-5.1:cloud - minimax-m2.7:cloud - kimi-k2.5:cloud ## How often Multiple times per hour. Sometimes a single request works, the next one fails with 503, then it works again. It's intermittent but frequent enough to break agent workflows. ## Impact - Agent requests timeout mid-conversation - Cron jobs fail unexpectedly - Users experience broken interactions in Telegram/Discord bots ## What I've tried - Retry logic with exponential backoff - Falling back to different models - None of it really helps when the 503s are this frequent ## Setup - Ollama running in Docker - Using the model suffix to hit ollama.com:443 - Fairly standard setup otherwise ## Would be helpful 1. Some visibility into when/why 503s happen (rate limiting vs capacity issues?) 2. A retry-after header so we can back off properly 3. Maybe a status page for Ollama Cloud uptime? Thanks for the great work on Ollama! Happy to provide more details if needed.
GiteaMirror added the cloud label 2026-05-05 02:56:47 -05:00
Author
Owner

@patelhiren commented on GitHub (Apr 8, 2026):

Experienced the same this morning. Frequent 503 errors but not consistent. A few requests fail with 503 then get one successful one and so on.

<!-- gh-comment-id:4207079804 --> @patelhiren commented on GitHub (Apr 8, 2026): Experienced the same this morning. Frequent 503 errors but not consistent. A few requests fail with 503 then get one successful one and so on.
Author
Owner

@MetaSep commented on GitHub (Apr 8, 2026):

Exact same issue, glm5.1 is unusable with open code and openclaw, consistent 503 errors.

<!-- gh-comment-id:4207374218 --> @MetaSep commented on GitHub (Apr 8, 2026): Exact same issue, glm5.1 is unusable with open code and openclaw, consistent 503 errors.
Author
Owner

@srflaherty commented on GitHub (Apr 8, 2026):

Same to report here on Qwen3.5 397B Cloud, opencode, and also noticed the Ollama Desktop app is struggling on my system to run the server, just hangs up after last update showing account trying to load but just sits there in that state. 503 down to even terminal calls to Ollama on my setup

<!-- gh-comment-id:4208294941 --> @srflaherty commented on GitHub (Apr 8, 2026): Same to report here on Qwen3.5 397B Cloud, opencode, and also noticed the Ollama Desktop app is struggling on my system to run the server, just hangs up after last update showing account trying to load but just sits there in that state. 503 down to even terminal calls to Ollama on my setup
Author
Owner

@backamblock commented on GitHub (Apr 8, 2026):

+1

<!-- gh-comment-id:4208874619 --> @backamblock commented on GitHub (Apr 8, 2026): +1
Author
Owner

@boeserwolf commented on GitHub (Apr 8, 2026):

+1

<!-- gh-comment-id:4210006064 --> @boeserwolf commented on GitHub (Apr 8, 2026): +1
Author
Owner

@Hevoon commented on GitHub (Apr 9, 2026):

#15290 same issue.

<!-- gh-comment-id:4211300852 --> @Hevoon commented on GitHub (Apr 9, 2026): #15290 same issue.
Author
Owner

@saltwater-tensor commented on GitHub (Apr 9, 2026):

+1
While using with vscode copilot agent mode (glm5.1) it is hitting 503 and 502 intermittently.

<!-- gh-comment-id:4213211189 --> @saltwater-tensor commented on GitHub (Apr 9, 2026): +1 While using with vscode copilot agent mode (glm5.1) it is hitting 503 and 502 intermittently.
Author
Owner

@imaboku commented on GitHub (Apr 10, 2026):

Same issues this week with Kimi2.5 and GLM5.1 w Claude Code. It would appear restarting restarting Ollama on my local machine resolves the issue for some time but after continuing a long session it will happen again. Have not run ollama server in debug mode yet to see if I can see more details on the error. Error that starts to show when issues arise: level=WARN source=cloud_proxy.go:257 msg="cloud proxy response copy failed" path=/v1/messages upstream_path=/v1/messages status=200 request_context_canceled=false request_context_err=<nil> error="unexpected EOF"

<!-- gh-comment-id:4220308932 --> @imaboku commented on GitHub (Apr 10, 2026): Same issues this week with Kimi2.5 and GLM5.1 w Claude Code. It would appear restarting restarting Ollama on my local machine resolves the issue for some time but after continuing a long session it will happen again. Have not run ollama server in debug mode yet to see if I can see more details on the error. Error that starts to show when issues arise: ` level=WARN source=cloud_proxy.go:257 msg="cloud proxy response copy failed" path=/v1/messages upstream_path=/v1/messages status=200 request_context_canceled=false request_context_err=<nil> error="unexpected EOF"`
Author
Owner

@Henkypenky1 commented on GitHub (Apr 10, 2026):

+1 , set up fallback models glm 5.1>glm 5> minimax m2.5 and landing frequently on minimax.

<!-- gh-comment-id:4226085704 --> @Henkypenky1 commented on GitHub (Apr 10, 2026): +1 , set up fallback models glm 5.1>glm 5> minimax m2.5 and landing frequently on minimax.
Author
Owner

@orrinwitt commented on GitHub (Apr 12, 2026):

Also running into this a lot lately on minimax-m2.7 and glm-5.1 using nanobot

<!-- gh-comment-id:4232615125 --> @orrinwitt commented on GitHub (Apr 12, 2026): Also running into this a lot lately on minimax-m2.7 and glm-5.1 using nanobot
Author
Owner

@brunobarbarioli commented on GitHub (Apr 13, 2026):

same using glm-5.1 . It has become unusable at this point

<!-- gh-comment-id:4236532664 --> @brunobarbarioli commented on GitHub (Apr 13, 2026): same using glm-5.1 . It has become unusable at this point
Author
Owner

@backamblock commented on GitHub (Apr 13, 2026):

puh.. now the other days it was slow, but today its just constantly dropping the connections.. how are we supposed to work with this?
funny this just started after i bought 200$ yearly sub for a colleague and me lol. -.-

is there something we can do on our end as users to make it better? other than not using it to reduce server load.

my failing models at the moment are qwen3.5, minimax2.7, glm5.1

<!-- gh-comment-id:4237161481 --> @backamblock commented on GitHub (Apr 13, 2026): puh.. now the other days it was slow, but today its just constantly dropping the connections.. how are we supposed to work with this? funny this just started after i bought 200$ yearly sub for a colleague and me lol. -.- is there something we can do on our end as users to make it better? other than not using it to reduce server load. my failing models at the moment are qwen3.5, minimax2.7, glm5.1
Author
Owner

@itisaevalex commented on GitHub (Apr 13, 2026):

+1

<!-- gh-comment-id:4238047284 --> @itisaevalex commented on GitHub (Apr 13, 2026): +1
Author
Owner

@danielithomas commented on GitHub (Apr 16, 2026):

Watching - Service status for cloud models would be useful.

<!-- gh-comment-id:4257704002 --> @danielithomas commented on GitHub (Apr 16, 2026): Watching - Service status for cloud models would be useful.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15419
Analyzed: 2026-04-18T18:21:34.290458

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274308560 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15419 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15419 **Analyzed**: 2026-04-18T18:21:34.290458 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Author
Owner

@Dawud18854 commented on GitHub (Apr 21, 2026):

+1

<!-- gh-comment-id:4289873611 --> @Dawud18854 commented on GitHub (Apr 21, 2026): +1
Author
Owner

@Ner-Kun commented on GitHub (Apr 29, 2026):

+1

<!-- gh-comment-id:4346385794 --> @Ner-Kun commented on GitHub (Apr 29, 2026): +1
Author
Owner

@dougdyson commented on GitHub (May 3, 2026):

+1

<!-- gh-comment-id:4366880563 --> @dougdyson commented on GitHub (May 3, 2026): +1
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71918