Files
KohakuHub/docs/api/stats.md
2025-10-24 18:45:55 +08:00

18 KiB

title, description, icon
title description icon
Statistics API Track repository downloads, likes, and trending models chart-bar

Statistics API

Track repository statistics including downloads, likes, and discover trending repositories.

Overview

KohakuHub provides comprehensive statistics tracking:

  • Download tracking: Session-based downloads (not individual file downloads)
  • Daily aggregation: Historical download data by day
  • Like/favorite counting: Track repository popularity
  • Trending algorithm: Discover popular repositories by recent activity
  • Lazy aggregation: Historical stats aggregated on-demand for performance

Endpoints

Get Repository Stats

Get basic statistics for a repository (downloads and likes).

Endpoint: GET /{repo_type}s/{namespace}/{name}/stats

Parameters:

Parameter Type Location Required Description
repo_type string path Yes Repository type: model, dataset, or space
namespace string path Yes Repository namespace
name string path Yes Repository name

Authentication: Optional (required for private repositories)

Response:

{
  "downloads": 1234567,
  "likes": 89
}

Field Descriptions:

Field Type Description
downloads integer Total download sessions (all time)
likes integer Number of users who liked this repository

Notes:

  • Downloads are counted by session, not individual files
  • A session includes all files downloaded within a short time window
  • Statistics are automatically aggregated when accessed
  • Today's stats are real-time; historical stats use lazy aggregation

Example:

import requests

response = requests.get(
    "http://localhost:28080/models/myorg/mymodel/stats"
)
stats = response.json()

print(f"Downloads: {stats['downloads']:,}")
print(f"Likes: {stats['likes']}")

Get Recent Statistics

Get detailed download statistics for recent days.

Endpoint: GET /{repo_type}s/{namespace}/{name}/stats/recent

Parameters:

Parameter Type Location Required Description
repo_type string path Yes Repository type: model, dataset, or space
namespace string path Yes Repository namespace
name string path Yes Repository name
days integer query No Number of days to retrieve (1-365, default: 30)

Authentication: Optional (required for private repositories)

Response:

{
  "stats": [
    {
      "date": "2025-01-15",
      "downloads": 123,
      "authenticated": 45,
      "anonymous": 78,
      "files": 456
    },
    {
      "date": "2025-01-16",
      "downloads": 145,
      "authenticated": 52,
      "anonymous": 93,
      "files": 512
    }
  ],
  "period": {
    "start": "2025-01-01",
    "end": "2025-01-30",
    "days": 30
  }
}

Field Descriptions:

Field Type Description
date string Date in YYYY-MM-DD format
downloads integer Download sessions for this day
authenticated integer Sessions from authenticated users
anonymous integer Sessions from anonymous users
files integer Total files downloaded

Use Cases:

  • Generate download charts
  • Analyze usage patterns
  • Track growth over time
  • Compare weekday vs. weekend usage

Example:

import requests
import matplotlib.pyplot as plt

# Get last 30 days
response = requests.get(
    "http://localhost:28080/models/myorg/mymodel/stats/recent",
    params={"days": 30}
)
data = response.json()

# Extract data for plotting
dates = [s["date"] for s in data["stats"]]
downloads = [s["downloads"] for s in data["stats"]]

# Plot
plt.figure(figsize=(12, 6))
plt.plot(dates, downloads, marker='o')
plt.xlabel("Date")
plt.ylabel("Downloads")
plt.title("Download Trend (Last 30 Days)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Discover trending repositories based on recent downloads.

Endpoint: GET /api/trending

Parameters:

Parameter Type Location Required Description
repo_type string query No Filter by type: model, dataset, or space (default: model)
days integer query No Calculate trend based on last N days (1-90, default: 7)
limit integer query No Maximum repositories to return (1-100, default: 20)

Authentication: Optional (affects private repository visibility)

Response:

{
  "trending": [
    {
      "id": "openai/gpt-4",
      "type": "model",
      "downloads": 5678900,
      "likes": 1234,
      "recent_downloads": 12345,
      "private": false
    },
    {
      "id": "myorg/popular-dataset",
      "type": "dataset",
      "downloads": 234567,
      "likes": 89,
      "recent_downloads": 8901,
      "private": false
    }
  ],
  "period": {
    "start": "2025-01-09",
    "end": "2025-01-16",
    "days": 7
  }
}

Field Descriptions:

Field Type Description
id string Repository full ID (namespace/name)
type string Repository type
downloads integer Total downloads (all time)
likes integer Total likes
recent_downloads integer Downloads in the specified period
private boolean Whether repository is private

Trending Algorithm:

Repositories are ranked by recent_downloads (downloads in the last N days).

Privacy:

  • Public repositories: Visible to everyone
  • Private repositories: Only visible to users with read permission
  • Anonymous users only see public trending repos

Example:

# Get top 10 trending models (last 7 days)
response = requests.get(
    "http://localhost:28080/api/trending",
    params={
        "repo_type": "model",
        "days": 7,
        "limit": 10
    }
)
trending = response.json()

print("Top Trending Models:")
for i, repo in enumerate(trending["trending"], 1):
    print(f"{i}. {repo['id']}")
    print(f"   Recent downloads: {repo['recent_downloads']:,}")
    print(f"   Total downloads: {repo['downloads']:,}")
    print(f"   Likes: {repo['likes']}")
    print()

# Get trending datasets (last 30 days)
response = requests.get(
    "http://localhost:28080/api/trending",
    params={
        "repo_type": "dataset",
        "days": 30,
        "limit": 20
    }
)

Usage Examples

Comprehensive Statistics Dashboard

import requests
from datetime import datetime, timedelta

BASE_URL = "http://localhost:28080"
TOKEN = "YOUR_TOKEN"

headers = {"Authorization": f"Bearer {TOKEN}"}

class StatsAnalyzer:
    def __init__(self, base_url: str, token: str = None):
        self.base_url = base_url
        self.headers = {"Authorization": f"Bearer {token}"} if token else {}

    def get_repo_stats(self, repo_type: str, namespace: str, name: str):
        """Get basic repository statistics."""
        response = requests.get(
            f"{self.base_url}/{repo_type}s/{namespace}/{name}/stats",
            headers=self.headers
        )
        response.raise_for_status()
        return response.json()

    def get_recent_stats(self, repo_type: str, namespace: str, name: str, days: int = 30):
        """Get recent daily statistics."""
        response = requests.get(
            f"{self.base_url}/{repo_type}s/{namespace}/{name}/stats/recent",
            params={"days": days},
            headers=self.headers
        )
        response.raise_for_status()
        return response.json()

    def get_trending(self, repo_type: str = "model", days: int = 7, limit: int = 20):
        """Get trending repositories."""
        response = requests.get(
            f"{self.base_url}/api/trending",
            params={
                "repo_type": repo_type,
                "days": days,
                "limit": limit
            },
            headers=self.headers
        )
        response.raise_for_status()
        return response.json()

    def analyze_growth(self, repo_type: str, namespace: str, name: str, days: int = 30):
        """Analyze repository growth trends."""
        data = self.get_recent_stats(repo_type, namespace, name, days)
        stats = data["stats"]

        if not stats:
            return None

        # Calculate growth metrics
        first_day = stats[0]["downloads"]
        last_day = stats[-1]["downloads"]
        total = sum(s["downloads"] for s in stats)
        avg_daily = total / len(stats)

        # Calculate week-over-week growth
        mid_point = len(stats) // 2
        first_half = sum(s["downloads"] for s in stats[:mid_point])
        second_half = sum(s["downloads"] for s in stats[mid_point:])

        growth_rate = ((second_half - first_half) / first_half * 100) if first_half > 0 else 0

        return {
            "total_downloads": total,
            "avg_daily_downloads": avg_daily,
            "growth_rate_percent": growth_rate,
            "first_day": first_day,
            "last_day": last_day,
            "trend": "up" if last_day > first_day else "down" if last_day < first_day else "stable"
        }

    def compare_auth_vs_anon(self, repo_type: str, namespace: str, name: str, days: int = 30):
        """Compare authenticated vs anonymous downloads."""
        data = self.get_recent_stats(repo_type, namespace, name, days)
        stats = data["stats"]

        total_auth = sum(s["authenticated"] for s in stats)
        total_anon = sum(s["anonymous"] for s in stats)
        total = total_auth + total_anon

        return {
            "authenticated": total_auth,
            "anonymous": total_anon,
            "authenticated_percent": (total_auth / total * 100) if total > 0 else 0,
            "anonymous_percent": (total_anon / total * 100) if total > 0 else 0
        }

    def print_summary(self, repo_type: str, namespace: str, name: str):
        """Print comprehensive statistics summary."""
        repo_id = f"{namespace}/{name}"
        print(f"\n=== Statistics Summary: {repo_id} ===\n")

        # Basic stats
        basic = self.get_repo_stats(repo_type, namespace, name)
        print(f"Total Downloads: {basic['downloads']:,}")
        print(f"Likes: {basic['likes']:,}")

        # Growth analysis
        growth = self.analyze_growth(repo_type, namespace, name, 30)
        if growth:
            print(f"\n30-Day Trends:")
            print(f"  Average daily: {growth['avg_daily_downloads']:.1f}")
            print(f"  Growth rate: {growth['growth_rate_percent']:+.1f}%")
            print(f"  Trend: {growth['trend']}")

        # Auth vs Anon
        auth_data = self.compare_auth_vs_anon(repo_type, namespace, name, 30)
        print(f"\nUser Distribution (30 days):")
        print(f"  Authenticated: {auth_data['authenticated_percent']:.1f}%")
        print(f"  Anonymous: {auth_data['anonymous_percent']:.1f}%")

# Usage
analyzer = StatsAnalyzer(BASE_URL, TOKEN)

# Get comprehensive summary
analyzer.print_summary("model", "myorg", "mymodel")

# Analyze growth
growth = analyzer.analyze_growth("model", "myorg", "mymodel", 90)
print(f"90-day growth: {growth['growth_rate_percent']:+.1f}%")

# Get trending
trending = analyzer.get_trending("model", days=7, limit=10)
print(f"\nTop 10 trending models:")
for i, repo in enumerate(trending["trending"], 1):
    print(f"{i}. {repo['id']}: {repo['recent_downloads']:,} downloads")

Export Statistics to CSV

import csv
from datetime import datetime

def export_stats_to_csv(repo_type: str, namespace: str, name: str,
                       days: int = 30, filename: str = None):
    """Export repository statistics to CSV file."""

    response = requests.get(
        f"{BASE_URL}/{repo_type}s/{namespace}/{name}/stats/recent",
        params={"days": days},
        headers={"Authorization": f"Bearer {TOKEN}"}
    )
    data = response.json()

    if not filename:
        filename = f"{namespace}_{name}_stats_{datetime.now():%Y%m%d}.csv"

    with open(filename, 'w', newline='') as csvfile:
        fieldnames = ["date", "downloads", "authenticated", "anonymous", "files"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for stat in data["stats"]:
            writer.writerow(stat)

    print(f"Exported statistics to {filename}")
    return filename

# Usage
export_stats_to_csv("model", "myorg", "mymodel", days=90)
def monitor_trending(repo_type: str = "model", top_n: int = 10):
    """Monitor and report trending repositories."""

    response = requests.get(
        f"{BASE_URL}/api/trending",
        params={
            "repo_type": repo_type,
            "days": 7,
            "limit": top_n
        }
    )
    trending = response.json()

    print(f"\n{'='*60}")
    print(f"Top {top_n} Trending {repo_type.title()}s (Last 7 Days)")
    print(f"{'='*60}\n")

    for i, repo in enumerate(trending["trending"], 1):
        print(f"{i:2d}. {repo['id']}")
        print(f"    Recent: {repo['recent_downloads']:>8,} downloads")
        print(f"    Total:  {repo['downloads']:>8,} downloads")
        print(f"    Likes:  {repo['likes']:>8,}")
        print()

# Usage
monitor_trending("model", top_n=10)
monitor_trending("dataset", top_n=5)

Weekly Report Generator

def generate_weekly_report(namespace: str, token: str):
    """Generate weekly statistics report for all repositories in namespace."""

    # Get all repositories
    from kohakuhub.api.repo import list_repositories  # Assuming you have this

    analyzer = StatsAnalyzer(BASE_URL, token)

    print(f"\n{'='*70}")
    print(f"Weekly Statistics Report - {namespace}")
    print(f"{'='*70}\n")

    # This would list all repos - simplified example
    repos = [
        ("model", "mymodel"),
        ("dataset", "mydataset")
    ]

    total_downloads = 0
    total_likes = 0

    for repo_type, name in repos:
        try:
            basic = analyzer.get_repo_stats(repo_type, namespace, name)
            growth = analyzer.analyze_growth(repo_type, namespace, name, 7)

            total_downloads += basic["downloads"]
            total_likes += basic["likes"]

            print(f"{repo_type}/{namespace}/{name}")
            print(f"  Downloads: {basic['downloads']:,} (7-day: {growth['total_downloads']:,})")
            print(f"  Likes: {basic['likes']:,}")
            print(f"  Trend: {growth['trend']} ({growth['growth_rate_percent']:+.1f}%)")
            print()

        except Exception as e:
            print(f"  Error: {e}\n")

    print(f"{'='*70}")
    print(f"Total Downloads: {total_downloads:,}")
    print(f"Total Likes: {total_likes:,}")
    print(f"{'='*70}\n")

# Usage
generate_weekly_report("myorg", TOKEN)

JavaScript/TypeScript Example

class StatsAPI {
  constructor(baseURL, token = null) {
    this.baseURL = baseURL;
    this.headers = token ? { 'Authorization': `Bearer ${token}` } : {};
  }

  async getRepoStats(repoType, namespace, name) {
    const response = await fetch(
      `${this.baseURL}/${repoType}s/${namespace}/${name}/stats`,
      { headers: this.headers }
    );
    return await response.json();
  }

  async getRecentStats(repoType, namespace, name, days = 30) {
    const response = await fetch(
      `${this.baseURL}/${repoType}s/${namespace}/${name}/stats/recent?days=${days}`,
      { headers: this.headers }
    );
    return await response.json();
  }

  async getTrending(repoType = 'model', days = 7, limit = 20) {
    const response = await fetch(
      `${this.baseURL}/api/trending?repo_type=${repoType}&days=${days}&limit=${limit}`,
      { headers: this.headers }
    );
    return await response.json();
  }

  async analyzeGrowth(repoType, namespace, name, days = 30) {
    const data = await this.getRecentStats(repoType, namespace, name, days);
    const stats = data.stats;

    if (!stats.length) return null;

    const total = stats.reduce((sum, s) => sum + s.downloads, 0);
    const avgDaily = total / stats.length;

    const midPoint = Math.floor(stats.length / 2);
    const firstHalf = stats.slice(0, midPoint)
      .reduce((sum, s) => sum + s.downloads, 0);
    const secondHalf = stats.slice(midPoint)
      .reduce((sum, s) => sum + s.downloads, 0);

    const growthRate = firstHalf > 0
      ? ((secondHalf - firstHalf) / firstHalf * 100)
      : 0;

    return {
      totalDownloads: total,
      avgDailyDownloads: avgDaily,
      growthRatePercent: growthRate,
      trend: secondHalf > firstHalf ? 'up' : secondHalf < firstHalf ? 'down' : 'stable'
    };
  }
}

// Usage
const statsAPI = new StatsAPI('http://localhost:28080', 'YOUR_TOKEN');

// Get basic stats
const stats = await statsAPI.getRepoStats('model', 'myorg', 'mymodel');
console.log(`Downloads: ${stats.downloads.toLocaleString()}`);
console.log(`Likes: ${stats.likes}`);

// Get trending
const trending = await statsAPI.getTrending('model', 7, 10);
console.log('\nTop Trending Models:');
trending.trending.forEach((repo, i) => {
  console.log(`${i + 1}. ${repo.id}: ${repo.recent_downloads.toLocaleString()} downloads`);
});

// Analyze growth
const growth = await statsAPI.analyzeGrowth('model', 'myorg', 'mymodel', 30);
console.log(`\n30-day growth: ${growth.growthRatePercent.toFixed(1)}%`);

CLI Usage

See CLI Documentation for command-line interface:

# Get repository stats
kohub-cli stats get model myorg/mymodel

# Get recent statistics (last 30 days)
kohub-cli stats recent model myorg/mymodel --days 30

# Get trending models
kohub-cli trending --type model --days 7 --limit 10

# Export to CSV
kohub-cli stats export model myorg/mymodel --days 90 --output stats.csv

Download Session Tracking

How Sessions Work

  • Session window: Downloads within a short time window (e.g., 30 minutes) count as one session
  • User-based: Tracked by user ID (authenticated) or IP + User-Agent (anonymous)
  • Repository-level: One session per repository, even if multiple files downloaded
  • Daily aggregation: Sessions are aggregated daily for historical analysis

What Counts as a Download?

  • Successful file downloads (HTTP 200)
  • Git clone operations
  • LFS file downloads
  • Failed downloads (HTTP 4xx/5xx)
  • HEAD requests (metadata only)
  • Tree browsing (no files downloaded)

Next Steps