[GH-ISSUE #15169] Missing token usage statistics in streaming responses for cloud models #9708

Open
opened 2026-04-12 22:35:13 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @panmcai on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15169

What is the issue?

I'm using Ollama cloud models (e.g., minimax-m2.7:cloud) and cannot find a way to get token usage statistics when using streaming mode.

Steps to reproduce:

  1. Run a streaming chat completion request:
curl -i http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.7:cloud",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'
  1. Check response headers - no token statistics present
  2. Check response body (each chunk) - no usage field

Expected behavior:

  • Response headers should include X-Prompt-Tokens, X-Completion-Tokens, X-Total-Tokens
  • Or the last chunk should include usage statistics
  • Or at least a way to track token consumption for cloud models

Actual behavior:

  • No token usage information in response headers
  • No usage field in streaming chunks
  • No way to track token consumption for billing/usage monitoring

Environment:

  • Ollama version: (0.19.0)
  • Model: minimax-m2.7:cloud (cloud model)
  • API endpoint: /v1/chat/completions

Additional context:

  • Non-streaming requests return usage in JSON response body
  • This is critical for monitoring usage and implementing fair billing

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @panmcai on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15169 ### What is the issue? I'm using Ollama cloud models (e.g., `minimax-m2.7:cloud`) and cannot find a way to get token usage statistics when using streaming mode. ### Steps to reproduce: 1. Run a streaming chat completion request: ```bash curl -i http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "minimax-m2.7:cloud", "messages": [{"role": "user", "content": "Hello"}], "stream": true }' ``` 2. Check response headers - no token statistics present 3. Check response body (each chunk) - no usage field ### Expected behavior: - Response headers should include `X-Prompt-Tokens`, `X-Completion-Tokens`, `X-Total-Tokens` - Or the last chunk should include usage statistics - Or at least a way to track token consumption for cloud models ### Actual behavior: - No token usage information in response headers - No usage field in streaming chunks - No way to track token consumption for billing/usage monitoring ### Environment: - Ollama version: (0.19.0) - Model: minimax-m2.7:cloud (cloud model) - API endpoint: /v1/chat/completions ### Additional context: - Non-streaming requests return usage in JSON response body - This is critical for monitoring usage and implementing fair billing ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the cloudbug labels 2026-04-12 22:35:13 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9708