--- title: Statistics API description: Track repository downloads, likes, and trending models icon: chart-bar --- # Statistics API Track repository statistics including downloads, likes, and discover trending repositories. ## Overview KohakuHub provides comprehensive statistics tracking: - **Download tracking**: Session-based downloads (not individual file downloads) - **Daily aggregation**: Historical download data by day - **Like/favorite counting**: Track repository popularity - **Trending algorithm**: Discover popular repositories by recent activity - **Lazy aggregation**: Historical stats aggregated on-demand for performance --- ## Endpoints ### Get Repository Stats Get basic statistics for a repository (downloads and likes). **Endpoint:** `GET /{repo_type}s/{namespace}/{name}/stats` **Parameters:** | Parameter | Type | Location | Required | Description | |-----------|------|----------|----------|-------------| | `repo_type` | string | path | Yes | Repository type: `model`, `dataset`, or `space` | | `namespace` | string | path | Yes | Repository namespace | | `name` | string | path | Yes | Repository name | **Authentication:** Optional (required for private repositories) **Response:** ```json { "downloads": 1234567, "likes": 89 } ``` **Field Descriptions:** | Field | Type | Description | |-------|------|-------------| | `downloads` | integer | Total download sessions (all time) | | `likes` | integer | Number of users who liked this repository | **Notes:** - Downloads are counted by **session**, not individual files - A session includes all files downloaded within a short time window - Statistics are automatically aggregated when accessed - Today's stats are real-time; historical stats use lazy aggregation **Example:** ```python import requests response = requests.get( "http://localhost:28080/models/myorg/mymodel/stats" ) stats = response.json() print(f"Downloads: {stats['downloads']:,}") print(f"Likes: {stats['likes']}") ``` --- ### Get Recent Statistics Get detailed download statistics for recent days. **Endpoint:** `GET /{repo_type}s/{namespace}/{name}/stats/recent` **Parameters:** | Parameter | Type | Location | Required | Description | |-----------|------|----------|----------|-------------| | `repo_type` | string | path | Yes | Repository type: `model`, `dataset`, or `space` | | `namespace` | string | path | Yes | Repository namespace | | `name` | string | path | Yes | Repository name | | `days` | integer | query | No | Number of days to retrieve (1-365, default: 30) | **Authentication:** Optional (required for private repositories) **Response:** ```json { "stats": [ { "date": "2025-01-15", "downloads": 123, "authenticated": 45, "anonymous": 78, "files": 456 }, { "date": "2025-01-16", "downloads": 145, "authenticated": 52, "anonymous": 93, "files": 512 } ], "period": { "start": "2025-01-01", "end": "2025-01-30", "days": 30 } } ``` **Field Descriptions:** | Field | Type | Description | |-------|------|-------------| | `date` | string | Date in YYYY-MM-DD format | | `downloads` | integer | Download sessions for this day | | `authenticated` | integer | Sessions from authenticated users | | `anonymous` | integer | Sessions from anonymous users | | `files` | integer | Total files downloaded | **Use Cases:** - Generate download charts - Analyze usage patterns - Track growth over time - Compare weekday vs. weekend usage **Example:** ```python import requests import matplotlib.pyplot as plt # Get last 30 days response = requests.get( "http://localhost:28080/models/myorg/mymodel/stats/recent", params={"days": 30} ) data = response.json() # Extract data for plotting dates = [s["date"] for s in data["stats"]] downloads = [s["downloads"] for s in data["stats"]] # Plot plt.figure(figsize=(12, 6)) plt.plot(dates, downloads, marker='o') plt.xlabel("Date") plt.ylabel("Downloads") plt.title("Download Trend (Last 30 Days)") plt.xticks(rotation=45) plt.tight_layout() plt.show() ``` --- ### Get Trending Repositories Discover trending repositories based on recent downloads. **Endpoint:** `GET /api/trending` **Parameters:** | Parameter | Type | Location | Required | Description | |-----------|------|----------|----------|-------------| | `repo_type` | string | query | No | Filter by type: `model`, `dataset`, or `space` (default: `model`) | | `days` | integer | query | No | Calculate trend based on last N days (1-90, default: 7) | | `limit` | integer | query | No | Maximum repositories to return (1-100, default: 20) | **Authentication:** Optional (affects private repository visibility) **Response:** ```json { "trending": [ { "id": "openai/gpt-4", "type": "model", "downloads": 5678900, "likes": 1234, "recent_downloads": 12345, "private": false }, { "id": "myorg/popular-dataset", "type": "dataset", "downloads": 234567, "likes": 89, "recent_downloads": 8901, "private": false } ], "period": { "start": "2025-01-09", "end": "2025-01-16", "days": 7 } } ``` **Field Descriptions:** | Field | Type | Description | |-------|------|-------------| | `id` | string | Repository full ID (`namespace/name`) | | `type` | string | Repository type | | `downloads` | integer | Total downloads (all time) | | `likes` | integer | Total likes | | `recent_downloads` | integer | Downloads in the specified period | | `private` | boolean | Whether repository is private | **Trending Algorithm:** Repositories are ranked by `recent_downloads` (downloads in the last N days). **Privacy:** - Public repositories: Visible to everyone - Private repositories: Only visible to users with read permission - Anonymous users only see public trending repos **Example:** ```python # Get top 10 trending models (last 7 days) response = requests.get( "http://localhost:28080/api/trending", params={ "repo_type": "model", "days": 7, "limit": 10 } ) trending = response.json() print("Top Trending Models:") for i, repo in enumerate(trending["trending"], 1): print(f"{i}. {repo['id']}") print(f" Recent downloads: {repo['recent_downloads']:,}") print(f" Total downloads: {repo['downloads']:,}") print(f" Likes: {repo['likes']}") print() # Get trending datasets (last 30 days) response = requests.get( "http://localhost:28080/api/trending", params={ "repo_type": "dataset", "days": 30, "limit": 20 } ) ``` --- ## Usage Examples ### Comprehensive Statistics Dashboard ```python import requests from datetime import datetime, timedelta BASE_URL = "http://localhost:28080" TOKEN = "YOUR_TOKEN" headers = {"Authorization": f"Bearer {TOKEN}"} class StatsAnalyzer: def __init__(self, base_url: str, token: str = None): self.base_url = base_url self.headers = {"Authorization": f"Bearer {token}"} if token else {} def get_repo_stats(self, repo_type: str, namespace: str, name: str): """Get basic repository statistics.""" response = requests.get( f"{self.base_url}/{repo_type}s/{namespace}/{name}/stats", headers=self.headers ) response.raise_for_status() return response.json() def get_recent_stats(self, repo_type: str, namespace: str, name: str, days: int = 30): """Get recent daily statistics.""" response = requests.get( f"{self.base_url}/{repo_type}s/{namespace}/{name}/stats/recent", params={"days": days}, headers=self.headers ) response.raise_for_status() return response.json() def get_trending(self, repo_type: str = "model", days: int = 7, limit: int = 20): """Get trending repositories.""" response = requests.get( f"{self.base_url}/api/trending", params={ "repo_type": repo_type, "days": days, "limit": limit }, headers=self.headers ) response.raise_for_status() return response.json() def analyze_growth(self, repo_type: str, namespace: str, name: str, days: int = 30): """Analyze repository growth trends.""" data = self.get_recent_stats(repo_type, namespace, name, days) stats = data["stats"] if not stats: return None # Calculate growth metrics first_day = stats[0]["downloads"] last_day = stats[-1]["downloads"] total = sum(s["downloads"] for s in stats) avg_daily = total / len(stats) # Calculate week-over-week growth mid_point = len(stats) // 2 first_half = sum(s["downloads"] for s in stats[:mid_point]) second_half = sum(s["downloads"] for s in stats[mid_point:]) growth_rate = ((second_half - first_half) / first_half * 100) if first_half > 0 else 0 return { "total_downloads": total, "avg_daily_downloads": avg_daily, "growth_rate_percent": growth_rate, "first_day": first_day, "last_day": last_day, "trend": "up" if last_day > first_day else "down" if last_day < first_day else "stable" } def compare_auth_vs_anon(self, repo_type: str, namespace: str, name: str, days: int = 30): """Compare authenticated vs anonymous downloads.""" data = self.get_recent_stats(repo_type, namespace, name, days) stats = data["stats"] total_auth = sum(s["authenticated"] for s in stats) total_anon = sum(s["anonymous"] for s in stats) total = total_auth + total_anon return { "authenticated": total_auth, "anonymous": total_anon, "authenticated_percent": (total_auth / total * 100) if total > 0 else 0, "anonymous_percent": (total_anon / total * 100) if total > 0 else 0 } def print_summary(self, repo_type: str, namespace: str, name: str): """Print comprehensive statistics summary.""" repo_id = f"{namespace}/{name}" print(f"\n=== Statistics Summary: {repo_id} ===\n") # Basic stats basic = self.get_repo_stats(repo_type, namespace, name) print(f"Total Downloads: {basic['downloads']:,}") print(f"Likes: {basic['likes']:,}") # Growth analysis growth = self.analyze_growth(repo_type, namespace, name, 30) if growth: print(f"\n30-Day Trends:") print(f" Average daily: {growth['avg_daily_downloads']:.1f}") print(f" Growth rate: {growth['growth_rate_percent']:+.1f}%") print(f" Trend: {growth['trend']}") # Auth vs Anon auth_data = self.compare_auth_vs_anon(repo_type, namespace, name, 30) print(f"\nUser Distribution (30 days):") print(f" Authenticated: {auth_data['authenticated_percent']:.1f}%") print(f" Anonymous: {auth_data['anonymous_percent']:.1f}%") # Usage analyzer = StatsAnalyzer(BASE_URL, TOKEN) # Get comprehensive summary analyzer.print_summary("model", "myorg", "mymodel") # Analyze growth growth = analyzer.analyze_growth("model", "myorg", "mymodel", 90) print(f"90-day growth: {growth['growth_rate_percent']:+.1f}%") # Get trending trending = analyzer.get_trending("model", days=7, limit=10) print(f"\nTop 10 trending models:") for i, repo in enumerate(trending["trending"], 1): print(f"{i}. {repo['id']}: {repo['recent_downloads']:,} downloads") ``` ### Export Statistics to CSV ```python import csv from datetime import datetime def export_stats_to_csv(repo_type: str, namespace: str, name: str, days: int = 30, filename: str = None): """Export repository statistics to CSV file.""" response = requests.get( f"{BASE_URL}/{repo_type}s/{namespace}/{name}/stats/recent", params={"days": days}, headers={"Authorization": f"Bearer {TOKEN}"} ) data = response.json() if not filename: filename = f"{namespace}_{name}_stats_{datetime.now():%Y%m%d}.csv" with open(filename, 'w', newline='') as csvfile: fieldnames = ["date", "downloads", "authenticated", "anonymous", "files"] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for stat in data["stats"]: writer.writerow(stat) print(f"Exported statistics to {filename}") return filename # Usage export_stats_to_csv("model", "myorg", "mymodel", days=90) ``` ### Monitor Trending Repositories ```python def monitor_trending(repo_type: str = "model", top_n: int = 10): """Monitor and report trending repositories.""" response = requests.get( f"{BASE_URL}/api/trending", params={ "repo_type": repo_type, "days": 7, "limit": top_n } ) trending = response.json() print(f"\n{'='*60}") print(f"Top {top_n} Trending {repo_type.title()}s (Last 7 Days)") print(f"{'='*60}\n") for i, repo in enumerate(trending["trending"], 1): print(f"{i:2d}. {repo['id']}") print(f" Recent: {repo['recent_downloads']:>8,} downloads") print(f" Total: {repo['downloads']:>8,} downloads") print(f" Likes: {repo['likes']:>8,}") print() # Usage monitor_trending("model", top_n=10) monitor_trending("dataset", top_n=5) ``` ### Weekly Report Generator ```python def generate_weekly_report(namespace: str, token: str): """Generate weekly statistics report for all repositories in namespace.""" # Get all repositories from kohakuhub.api.repo import list_repositories # Assuming you have this analyzer = StatsAnalyzer(BASE_URL, token) print(f"\n{'='*70}") print(f"Weekly Statistics Report - {namespace}") print(f"{'='*70}\n") # This would list all repos - simplified example repos = [ ("model", "mymodel"), ("dataset", "mydataset") ] total_downloads = 0 total_likes = 0 for repo_type, name in repos: try: basic = analyzer.get_repo_stats(repo_type, namespace, name) growth = analyzer.analyze_growth(repo_type, namespace, name, 7) total_downloads += basic["downloads"] total_likes += basic["likes"] print(f"{repo_type}/{namespace}/{name}") print(f" Downloads: {basic['downloads']:,} (7-day: {growth['total_downloads']:,})") print(f" Likes: {basic['likes']:,}") print(f" Trend: {growth['trend']} ({growth['growth_rate_percent']:+.1f}%)") print() except Exception as e: print(f" Error: {e}\n") print(f"{'='*70}") print(f"Total Downloads: {total_downloads:,}") print(f"Total Likes: {total_likes:,}") print(f"{'='*70}\n") # Usage generate_weekly_report("myorg", TOKEN) ``` --- ## JavaScript/TypeScript Example ```javascript class StatsAPI { constructor(baseURL, token = null) { this.baseURL = baseURL; this.headers = token ? { 'Authorization': `Bearer ${token}` } : {}; } async getRepoStats(repoType, namespace, name) { const response = await fetch( `${this.baseURL}/${repoType}s/${namespace}/${name}/stats`, { headers: this.headers } ); return await response.json(); } async getRecentStats(repoType, namespace, name, days = 30) { const response = await fetch( `${this.baseURL}/${repoType}s/${namespace}/${name}/stats/recent?days=${days}`, { headers: this.headers } ); return await response.json(); } async getTrending(repoType = 'model', days = 7, limit = 20) { const response = await fetch( `${this.baseURL}/api/trending?repo_type=${repoType}&days=${days}&limit=${limit}`, { headers: this.headers } ); return await response.json(); } async analyzeGrowth(repoType, namespace, name, days = 30) { const data = await this.getRecentStats(repoType, namespace, name, days); const stats = data.stats; if (!stats.length) return null; const total = stats.reduce((sum, s) => sum + s.downloads, 0); const avgDaily = total / stats.length; const midPoint = Math.floor(stats.length / 2); const firstHalf = stats.slice(0, midPoint) .reduce((sum, s) => sum + s.downloads, 0); const secondHalf = stats.slice(midPoint) .reduce((sum, s) => sum + s.downloads, 0); const growthRate = firstHalf > 0 ? ((secondHalf - firstHalf) / firstHalf * 100) : 0; return { totalDownloads: total, avgDailyDownloads: avgDaily, growthRatePercent: growthRate, trend: secondHalf > firstHalf ? 'up' : secondHalf < firstHalf ? 'down' : 'stable' }; } } // Usage const statsAPI = new StatsAPI('http://localhost:28080', 'YOUR_TOKEN'); // Get basic stats const stats = await statsAPI.getRepoStats('model', 'myorg', 'mymodel'); console.log(`Downloads: ${stats.downloads.toLocaleString()}`); console.log(`Likes: ${stats.likes}`); // Get trending const trending = await statsAPI.getTrending('model', 7, 10); console.log('\nTop Trending Models:'); trending.trending.forEach((repo, i) => { console.log(`${i + 1}. ${repo.id}: ${repo.recent_downloads.toLocaleString()} downloads`); }); // Analyze growth const growth = await statsAPI.analyzeGrowth('model', 'myorg', 'mymodel', 30); console.log(`\n30-day growth: ${growth.growthRatePercent.toFixed(1)}%`); ``` --- ## CLI Usage See [CLI Documentation](../CLI.md#statistics) for command-line interface: ```bash # Get repository stats kohub-cli stats get model myorg/mymodel # Get recent statistics (last 30 days) kohub-cli stats recent model myorg/mymodel --days 30 # Get trending models kohub-cli trending --type model --days 7 --limit 10 # Export to CSV kohub-cli stats export model myorg/mymodel --days 90 --output stats.csv ``` --- ## Download Session Tracking ### How Sessions Work - **Session window**: Downloads within a short time window (e.g., 30 minutes) count as one session - **User-based**: Tracked by user ID (authenticated) or IP + User-Agent (anonymous) - **Repository-level**: One session per repository, even if multiple files downloaded - **Daily aggregation**: Sessions are aggregated daily for historical analysis ### What Counts as a Download? - ✅ Successful file downloads (HTTP 200) - ✅ Git clone operations - ✅ LFS file downloads - ❌ Failed downloads (HTTP 4xx/5xx) - ❌ HEAD requests (metadata only) - ❌ Tree browsing (no files downloaded) --- ## Next Steps - [Quota Management API](./quota.md) - Monitor storage usage - [Repository API](../API.md#repositories) - Repository management - [Likes API](../API.md#likes) - Like/unlike repositories - [File Tree API](./tree.md) - Browse repository contents