[GH-ISSUE #15663] Feature Request: Expose account quota/usage details via Ollama Cloud API (headers and/or response body) #56504

Open
opened 2026-04-29 10:56:08 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @TH33ORACL3 on GitHub (Apr 18, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15663

Summary

Ollama Cloud API responses do not include account-level quota or usage information. Today, the only way to know your remaining quota, monthly limit, or reset date is to log into the web dashboard at https://ollama.com or wait for an automated email warning. Please expose this data through the API so it can be consumed programmatically.

Suggested Labels

Maintainers — please consider applying:

  • feature request
  • ollama.com (this is a Cloud-side change)
  • feedback wanted

(External contributors cannot apply labels.)

Environment

  • Ollama version: 0.21.0
  • OS: macOS 26.4.1 (build 25E253)
  • Endpoint: https://ollama.com (Cloud)
  • Auth: Personal API key (ollama-cloud provider)

Current Behavior

Calling any Cloud endpoint (e.g., POST /api/chat, POST /api/generate, POST /api/embed) returns a response that includes per-request token counts but no account-level quota information:

{
  "model": "gpt-oss:120b",
  "created_at": "2026-04-18T...",
  "message": { "role": "assistant", "content": "..." },
  "done": true,
  "prompt_eval_count": 123,
  "eval_count": 456
}

Response headers also contain no quota metadata:

HTTP/2 200
content-type: application/json
date: ...

To find out how close I am to my monthly cap, I have to:

  1. Open a browser
  2. Log into https://ollama.com
  3. Navigate to the usage page

…or wait for the threshold email Ollama sends near the limit. Neither path is usable from a CLI tool, an editor integration, or an automated workflow.

Proposed Behavior

Option A — HTTP response headers (preferred, lowest friction)

Add headers similar to GitHub's, OpenAI's, and Anthropic's rate-limit conventions:

X-Ollama-Quota-Limit:      <int>           # tokens or requests per period
X-Ollama-Quota-Remaining:  <int>
X-Ollama-Quota-Used:       <int>
X-Ollama-Quota-Reset:      <ISO-8601>      # when the quota resets
X-Ollama-Quota-Period:     monthly|daily   # billing period granularity

Headers are cheap to add, don't change response schemas, and are trivially consumable by any HTTP client.

Option B — Extend the JSON usage/response body

{
  "model": "gpt-oss:120b",
  "done": true,
  "prompt_eval_count": 123,
  "eval_count": 456,
  "account": {
    "quota": {
      "period": "monthly",
      "limit": 1000000,
      "used": 123456,
      "remaining": 876544,
      "reset_at": "2026-05-01T00:00:00Z"
    }
  }
}

Option C — Dedicated endpoint

GET https://ollama.com/api/account/usage returning the same structure as Option B. Useful for proactive polling without making a model call.

Ideally, ship A + C: headers piggyback on existing calls, and a dedicated endpoint lets tooling check status without spending tokens.

Why This Matters

Use case Without API quota With API quota
CLI tools (e.g., pi, custom wrappers) Must scrape dashboard or surprise users with 429s Can warn user at 80%, switch keys, or pause
Multi-key rotation (I have 10 keys configured) Can't tell which key is closest to its limit Can rotate to the most-available key automatically
Cost dashboards Manual screenshots from web UI Real-time monitoring
CI/CD jobs Job fails mid-run when quota hits Job can fail fast or fall back
IDE integrations No usage signal In-line usage indicator

Comparison With Other Providers

Provider Exposes quota in API? How
OpenAI x-ratelimit-limit-*, x-ratelimit-remaining-*, x-ratelimit-reset-* headers
Anthropic anthropic-ratelimit-* headers
Google Gemini Quota visible via Cloud Quotas API
GitHub API x-ratelimit-* headers (industry standard)
Ollama Cloud Dashboard + email only

Ollama is currently the outlier here.

Acceptance Criteria

  • At least one Cloud endpoint returns quota information (headers or body) in production.
  • The fields cover: limit, used, remaining, reset timestamp, period.
  • The behavior is documented at https://docs.ollama.com (or the Cloud-specific docs).
  • Behavior is consistent across /api/chat, /api/generate, and /api/embed.
  • No regression for self-hosted Ollama (these fields should simply be absent when not running against the Cloud).

Workarounds I'm Using Today

  • Manually opening the dashboard.
  • Tracking prompt_eval_count + eval_count locally and comparing against a hand-copied limit.
  • Rotating between 10 API keys and waiting for Ollama's email to tell me which account is near its cap (this is the literal trigger for filing this issue — it's an absurd UX).
  • Ollama Cloud docs do not currently document any usage/quota endpoint.
  • Discussions in the Ollama Discord #feature-requests channel have raised similar asks.

Willing To Help

Happy to test a beta of any of the three options above against my account, and happy to provide feedback from the perspective of a CLI/agent tooling author.


Thanks for considering — this is a small change with a big DX payoff. 🙏

Originally created by @TH33ORACL3 on GitHub (Apr 18, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15663 ## Summary Ollama Cloud API responses do not include account-level quota or usage information. Today, the only way to know your remaining quota, monthly limit, or reset date is to log into the web dashboard at https://ollama.com or wait for an automated email warning. Please expose this data through the API so it can be consumed programmatically. ## Suggested Labels Maintainers — please consider applying: - `feature request` - `ollama.com` (this is a Cloud-side change) - `feedback wanted` (External contributors cannot apply labels.) ## Environment - **Ollama version:** 0.21.0 - **OS:** macOS 26.4.1 (build 25E253) - **Endpoint:** `https://ollama.com` (Cloud) - **Auth:** Personal API key (`ollama-cloud` provider) ## Current Behavior Calling any Cloud endpoint (e.g., `POST /api/chat`, `POST /api/generate`, `POST /api/embed`) returns a response that includes per-request token counts but no account-level quota information: ```json { "model": "gpt-oss:120b", "created_at": "2026-04-18T...", "message": { "role": "assistant", "content": "..." }, "done": true, "prompt_eval_count": 123, "eval_count": 456 } ``` Response headers also contain no quota metadata: ``` HTTP/2 200 content-type: application/json date: ... ``` To find out how close I am to my monthly cap, I have to: 1. Open a browser 2. Log into https://ollama.com 3. Navigate to the usage page …or wait for the threshold email Ollama sends near the limit. Neither path is usable from a CLI tool, an editor integration, or an automated workflow. ## Proposed Behavior ### Option A — HTTP response headers (preferred, lowest friction) Add headers similar to GitHub's, OpenAI's, and Anthropic's rate-limit conventions: ``` X-Ollama-Quota-Limit: <int> # tokens or requests per period X-Ollama-Quota-Remaining: <int> X-Ollama-Quota-Used: <int> X-Ollama-Quota-Reset: <ISO-8601> # when the quota resets X-Ollama-Quota-Period: monthly|daily # billing period granularity ``` Headers are cheap to add, don't change response schemas, and are trivially consumable by any HTTP client. ### Option B — Extend the JSON `usage`/response body ```json { "model": "gpt-oss:120b", "done": true, "prompt_eval_count": 123, "eval_count": 456, "account": { "quota": { "period": "monthly", "limit": 1000000, "used": 123456, "remaining": 876544, "reset_at": "2026-05-01T00:00:00Z" } } } ``` ### Option C — Dedicated endpoint `GET https://ollama.com/api/account/usage` returning the same structure as Option B. Useful for proactive polling without making a model call. Ideally, ship **A + C**: headers piggyback on existing calls, and a dedicated endpoint lets tooling check status without spending tokens. ## Why This Matters | Use case | Without API quota | With API quota | |---|---|---| | CLI tools (e.g., `pi`, custom wrappers) | Must scrape dashboard or surprise users with 429s | Can warn user at 80%, switch keys, or pause | | Multi-key rotation (I have 10 keys configured) | Can't tell which key is closest to its limit | Can rotate to the most-available key automatically | | Cost dashboards | Manual screenshots from web UI | Real-time monitoring | | CI/CD jobs | Job fails mid-run when quota hits | Job can fail fast or fall back | | IDE integrations | No usage signal | In-line usage indicator | ## Comparison With Other Providers | Provider | Exposes quota in API? | How | |---|---|---| | OpenAI | ✅ | `x-ratelimit-limit-*`, `x-ratelimit-remaining-*`, `x-ratelimit-reset-*` headers | | Anthropic | ✅ | `anthropic-ratelimit-*` headers | | Google Gemini | ✅ | Quota visible via Cloud Quotas API | | GitHub API | ✅ | `x-ratelimit-*` headers (industry standard) | | **Ollama Cloud** | ❌ | Dashboard + email only | Ollama is currently the outlier here. ## Acceptance Criteria - [ ] At least one Cloud endpoint returns quota information (headers or body) in production. - [ ] The fields cover: limit, used, remaining, reset timestamp, period. - [ ] The behavior is documented at https://docs.ollama.com (or the Cloud-specific docs). - [ ] Behavior is consistent across `/api/chat`, `/api/generate`, and `/api/embed`. - [ ] No regression for self-hosted Ollama (these fields should simply be absent when not running against the Cloud). ## Workarounds I'm Using Today - Manually opening the dashboard. - Tracking `prompt_eval_count + eval_count` locally and comparing against a hand-copied limit. - Rotating between 10 API keys and waiting for Ollama's email to tell me which account is near its cap (this is the literal trigger for filing this issue — it's an absurd UX). ## Related - Ollama Cloud docs do not currently document any usage/quota endpoint. - Discussions in the Ollama Discord `#feature-requests` channel have raised similar asks. ## Willing To Help Happy to test a beta of any of the three options above against my account, and happy to provide feedback from the perspective of a CLI/agent tooling author. --- Thanks for considering — this is a small change with a big DX payoff. 🙏
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15663
Analyzed: 2026-04-18T18:13:46.423309

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274294903 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15663 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15663 **Analyzed**: 2026-04-18T18:13:46.423309 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Author
Owner

@ben-vargas commented on GitHub (Apr 21, 2026):

Agreed, this is a real gap for ollama cloud users and when signing up I expected an endpoint would be available enabling to monitor quota without having to login to browser page and constantly click refresh.

<!-- gh-comment-id:4290599432 --> @ben-vargas commented on GitHub (Apr 21, 2026): Agreed, this is a real gap for ollama cloud users and when signing up I expected an endpoint would be available enabling to monitor quota without having to login to browser page and constantly click refresh.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56504