[GH-ISSUE #15169] Missing token usage statistics in streaming responses for cloud models #56221

Open
opened 2026-04-29 10:26:46 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @panmcai on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15169

What is the issue?

I'm using Ollama cloud models (e.g., minimax-m2.7:cloud) and cannot find a way to get token usage statistics when using streaming mode.

Steps to reproduce:

  1. Run a streaming chat completion request:
curl -i http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.7:cloud",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'
  1. Check response headers - no token statistics present
  2. Check response body (each chunk) - no usage field

Expected behavior:

  • Response headers should include X-Prompt-Tokens, X-Completion-Tokens, X-Total-Tokens
  • Or the last chunk should include usage statistics
  • Or at least a way to track token consumption for cloud models

Actual behavior:

  • No token usage information in response headers
  • No usage field in streaming chunks
  • No way to track token consumption for billing/usage monitoring

Environment:

  • Ollama version: (0.19.0)
  • Model: minimax-m2.7:cloud (cloud model)
  • API endpoint: /v1/chat/completions

Additional context:

  • Non-streaming requests return usage in JSON response body
  • This is critical for monitoring usage and implementing fair billing

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @panmcai on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15169 ### What is the issue? I'm using Ollama cloud models (e.g., `minimax-m2.7:cloud`) and cannot find a way to get token usage statistics when using streaming mode. ### Steps to reproduce: 1. Run a streaming chat completion request: ```bash curl -i http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "minimax-m2.7:cloud", "messages": [{"role": "user", "content": "Hello"}], "stream": true }' ``` 2. Check response headers - no token statistics present 3. Check response body (each chunk) - no usage field ### Expected behavior: - Response headers should include `X-Prompt-Tokens`, `X-Completion-Tokens`, `X-Total-Tokens` - Or the last chunk should include usage statistics - Or at least a way to track token consumption for cloud models ### Actual behavior: - No token usage information in response headers - No usage field in streaming chunks - No way to track token consumption for billing/usage monitoring ### Environment: - Ollama version: (0.19.0) - Model: minimax-m2.7:cloud (cloud model) - API endpoint: /v1/chat/completions ### Additional context: - Non-streaming requests return usage in JSON response body - This is critical for monitoring usage and implementing fair billing ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the cloudbug labels 2026-04-29 10:26:46 -05:00
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15169
Analyzed: 2026-04-18T18:22:59.344311

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274311033 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15169 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15169 **Analyzed**: 2026-04-18T18:22:59.344311 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56221