18 KiB
title, description, icon
| title | description | icon |
|---|---|---|
| Statistics API | Track repository downloads, likes, and trending models | chart-bar |
Statistics API
Track repository statistics including downloads, likes, and discover trending repositories.
Overview
KohakuHub provides comprehensive statistics tracking:
- Download tracking: Session-based downloads (not individual file downloads)
- Daily aggregation: Historical download data by day
- Like/favorite counting: Track repository popularity
- Trending algorithm: Discover popular repositories by recent activity
- Lazy aggregation: Historical stats aggregated on-demand for performance
Endpoints
Get Repository Stats
Get basic statistics for a repository (downloads and likes).
Endpoint: GET /{repo_type}s/{namespace}/{name}/stats
Parameters:
| Parameter | Type | Location | Required | Description |
|---|---|---|---|---|
repo_type |
string | path | Yes | Repository type: model, dataset, or space |
namespace |
string | path | Yes | Repository namespace |
name |
string | path | Yes | Repository name |
Authentication: Optional (required for private repositories)
Response:
{
"downloads": 1234567,
"likes": 89
}
Field Descriptions:
| Field | Type | Description |
|---|---|---|
downloads |
integer | Total download sessions (all time) |
likes |
integer | Number of users who liked this repository |
Notes:
- Downloads are counted by session, not individual files
- A session includes all files downloaded within a short time window
- Statistics are automatically aggregated when accessed
- Today's stats are real-time; historical stats use lazy aggregation
Example:
import requests
response = requests.get(
"http://localhost:28080/models/myorg/mymodel/stats"
)
stats = response.json()
print(f"Downloads: {stats['downloads']:,}")
print(f"Likes: {stats['likes']}")
Get Recent Statistics
Get detailed download statistics for recent days.
Endpoint: GET /{repo_type}s/{namespace}/{name}/stats/recent
Parameters:
| Parameter | Type | Location | Required | Description |
|---|---|---|---|---|
repo_type |
string | path | Yes | Repository type: model, dataset, or space |
namespace |
string | path | Yes | Repository namespace |
name |
string | path | Yes | Repository name |
days |
integer | query | No | Number of days to retrieve (1-365, default: 30) |
Authentication: Optional (required for private repositories)
Response:
{
"stats": [
{
"date": "2025-01-15",
"downloads": 123,
"authenticated": 45,
"anonymous": 78,
"files": 456
},
{
"date": "2025-01-16",
"downloads": 145,
"authenticated": 52,
"anonymous": 93,
"files": 512
}
],
"period": {
"start": "2025-01-01",
"end": "2025-01-30",
"days": 30
}
}
Field Descriptions:
| Field | Type | Description |
|---|---|---|
date |
string | Date in YYYY-MM-DD format |
downloads |
integer | Download sessions for this day |
authenticated |
integer | Sessions from authenticated users |
anonymous |
integer | Sessions from anonymous users |
files |
integer | Total files downloaded |
Use Cases:
- Generate download charts
- Analyze usage patterns
- Track growth over time
- Compare weekday vs. weekend usage
Example:
import requests
import matplotlib.pyplot as plt
# Get last 30 days
response = requests.get(
"http://localhost:28080/models/myorg/mymodel/stats/recent",
params={"days": 30}
)
data = response.json()
# Extract data for plotting
dates = [s["date"] for s in data["stats"]]
downloads = [s["downloads"] for s in data["stats"]]
# Plot
plt.figure(figsize=(12, 6))
plt.plot(dates, downloads, marker='o')
plt.xlabel("Date")
plt.ylabel("Downloads")
plt.title("Download Trend (Last 30 Days)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Get Trending Repositories
Discover trending repositories based on recent downloads.
Endpoint: GET /api/trending
Parameters:
| Parameter | Type | Location | Required | Description |
|---|---|---|---|---|
repo_type |
string | query | No | Filter by type: model, dataset, or space (default: model) |
days |
integer | query | No | Calculate trend based on last N days (1-90, default: 7) |
limit |
integer | query | No | Maximum repositories to return (1-100, default: 20) |
Authentication: Optional (affects private repository visibility)
Response:
{
"trending": [
{
"id": "openai/gpt-4",
"type": "model",
"downloads": 5678900,
"likes": 1234,
"recent_downloads": 12345,
"private": false
},
{
"id": "myorg/popular-dataset",
"type": "dataset",
"downloads": 234567,
"likes": 89,
"recent_downloads": 8901,
"private": false
}
],
"period": {
"start": "2025-01-09",
"end": "2025-01-16",
"days": 7
}
}
Field Descriptions:
| Field | Type | Description |
|---|---|---|
id |
string | Repository full ID (namespace/name) |
type |
string | Repository type |
downloads |
integer | Total downloads (all time) |
likes |
integer | Total likes |
recent_downloads |
integer | Downloads in the specified period |
private |
boolean | Whether repository is private |
Trending Algorithm:
Repositories are ranked by recent_downloads (downloads in the last N days).
Privacy:
- Public repositories: Visible to everyone
- Private repositories: Only visible to users with read permission
- Anonymous users only see public trending repos
Example:
# Get top 10 trending models (last 7 days)
response = requests.get(
"http://localhost:28080/api/trending",
params={
"repo_type": "model",
"days": 7,
"limit": 10
}
)
trending = response.json()
print("Top Trending Models:")
for i, repo in enumerate(trending["trending"], 1):
print(f"{i}. {repo['id']}")
print(f" Recent downloads: {repo['recent_downloads']:,}")
print(f" Total downloads: {repo['downloads']:,}")
print(f" Likes: {repo['likes']}")
print()
# Get trending datasets (last 30 days)
response = requests.get(
"http://localhost:28080/api/trending",
params={
"repo_type": "dataset",
"days": 30,
"limit": 20
}
)
Usage Examples
Comprehensive Statistics Dashboard
import requests
from datetime import datetime, timedelta
BASE_URL = "http://localhost:28080"
TOKEN = "YOUR_TOKEN"
headers = {"Authorization": f"Bearer {TOKEN}"}
class StatsAnalyzer:
def __init__(self, base_url: str, token: str = None):
self.base_url = base_url
self.headers = {"Authorization": f"Bearer {token}"} if token else {}
def get_repo_stats(self, repo_type: str, namespace: str, name: str):
"""Get basic repository statistics."""
response = requests.get(
f"{self.base_url}/{repo_type}s/{namespace}/{name}/stats",
headers=self.headers
)
response.raise_for_status()
return response.json()
def get_recent_stats(self, repo_type: str, namespace: str, name: str, days: int = 30):
"""Get recent daily statistics."""
response = requests.get(
f"{self.base_url}/{repo_type}s/{namespace}/{name}/stats/recent",
params={"days": days},
headers=self.headers
)
response.raise_for_status()
return response.json()
def get_trending(self, repo_type: str = "model", days: int = 7, limit: int = 20):
"""Get trending repositories."""
response = requests.get(
f"{self.base_url}/api/trending",
params={
"repo_type": repo_type,
"days": days,
"limit": limit
},
headers=self.headers
)
response.raise_for_status()
return response.json()
def analyze_growth(self, repo_type: str, namespace: str, name: str, days: int = 30):
"""Analyze repository growth trends."""
data = self.get_recent_stats(repo_type, namespace, name, days)
stats = data["stats"]
if not stats:
return None
# Calculate growth metrics
first_day = stats[0]["downloads"]
last_day = stats[-1]["downloads"]
total = sum(s["downloads"] for s in stats)
avg_daily = total / len(stats)
# Calculate week-over-week growth
mid_point = len(stats) // 2
first_half = sum(s["downloads"] for s in stats[:mid_point])
second_half = sum(s["downloads"] for s in stats[mid_point:])
growth_rate = ((second_half - first_half) / first_half * 100) if first_half > 0 else 0
return {
"total_downloads": total,
"avg_daily_downloads": avg_daily,
"growth_rate_percent": growth_rate,
"first_day": first_day,
"last_day": last_day,
"trend": "up" if last_day > first_day else "down" if last_day < first_day else "stable"
}
def compare_auth_vs_anon(self, repo_type: str, namespace: str, name: str, days: int = 30):
"""Compare authenticated vs anonymous downloads."""
data = self.get_recent_stats(repo_type, namespace, name, days)
stats = data["stats"]
total_auth = sum(s["authenticated"] for s in stats)
total_anon = sum(s["anonymous"] for s in stats)
total = total_auth + total_anon
return {
"authenticated": total_auth,
"anonymous": total_anon,
"authenticated_percent": (total_auth / total * 100) if total > 0 else 0,
"anonymous_percent": (total_anon / total * 100) if total > 0 else 0
}
def print_summary(self, repo_type: str, namespace: str, name: str):
"""Print comprehensive statistics summary."""
repo_id = f"{namespace}/{name}"
print(f"\n=== Statistics Summary: {repo_id} ===\n")
# Basic stats
basic = self.get_repo_stats(repo_type, namespace, name)
print(f"Total Downloads: {basic['downloads']:,}")
print(f"Likes: {basic['likes']:,}")
# Growth analysis
growth = self.analyze_growth(repo_type, namespace, name, 30)
if growth:
print(f"\n30-Day Trends:")
print(f" Average daily: {growth['avg_daily_downloads']:.1f}")
print(f" Growth rate: {growth['growth_rate_percent']:+.1f}%")
print(f" Trend: {growth['trend']}")
# Auth vs Anon
auth_data = self.compare_auth_vs_anon(repo_type, namespace, name, 30)
print(f"\nUser Distribution (30 days):")
print(f" Authenticated: {auth_data['authenticated_percent']:.1f}%")
print(f" Anonymous: {auth_data['anonymous_percent']:.1f}%")
# Usage
analyzer = StatsAnalyzer(BASE_URL, TOKEN)
# Get comprehensive summary
analyzer.print_summary("model", "myorg", "mymodel")
# Analyze growth
growth = analyzer.analyze_growth("model", "myorg", "mymodel", 90)
print(f"90-day growth: {growth['growth_rate_percent']:+.1f}%")
# Get trending
trending = analyzer.get_trending("model", days=7, limit=10)
print(f"\nTop 10 trending models:")
for i, repo in enumerate(trending["trending"], 1):
print(f"{i}. {repo['id']}: {repo['recent_downloads']:,} downloads")
Export Statistics to CSV
import csv
from datetime import datetime
def export_stats_to_csv(repo_type: str, namespace: str, name: str,
days: int = 30, filename: str = None):
"""Export repository statistics to CSV file."""
response = requests.get(
f"{BASE_URL}/{repo_type}s/{namespace}/{name}/stats/recent",
params={"days": days},
headers={"Authorization": f"Bearer {TOKEN}"}
)
data = response.json()
if not filename:
filename = f"{namespace}_{name}_stats_{datetime.now():%Y%m%d}.csv"
with open(filename, 'w', newline='') as csvfile:
fieldnames = ["date", "downloads", "authenticated", "anonymous", "files"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for stat in data["stats"]:
writer.writerow(stat)
print(f"Exported statistics to {filename}")
return filename
# Usage
export_stats_to_csv("model", "myorg", "mymodel", days=90)
Monitor Trending Repositories
def monitor_trending(repo_type: str = "model", top_n: int = 10):
"""Monitor and report trending repositories."""
response = requests.get(
f"{BASE_URL}/api/trending",
params={
"repo_type": repo_type,
"days": 7,
"limit": top_n
}
)
trending = response.json()
print(f"\n{'='*60}")
print(f"Top {top_n} Trending {repo_type.title()}s (Last 7 Days)")
print(f"{'='*60}\n")
for i, repo in enumerate(trending["trending"], 1):
print(f"{i:2d}. {repo['id']}")
print(f" Recent: {repo['recent_downloads']:>8,} downloads")
print(f" Total: {repo['downloads']:>8,} downloads")
print(f" Likes: {repo['likes']:>8,}")
print()
# Usage
monitor_trending("model", top_n=10)
monitor_trending("dataset", top_n=5)
Weekly Report Generator
def generate_weekly_report(namespace: str, token: str):
"""Generate weekly statistics report for all repositories in namespace."""
# Get all repositories
from kohakuhub.api.repo import list_repositories # Assuming you have this
analyzer = StatsAnalyzer(BASE_URL, token)
print(f"\n{'='*70}")
print(f"Weekly Statistics Report - {namespace}")
print(f"{'='*70}\n")
# This would list all repos - simplified example
repos = [
("model", "mymodel"),
("dataset", "mydataset")
]
total_downloads = 0
total_likes = 0
for repo_type, name in repos:
try:
basic = analyzer.get_repo_stats(repo_type, namespace, name)
growth = analyzer.analyze_growth(repo_type, namespace, name, 7)
total_downloads += basic["downloads"]
total_likes += basic["likes"]
print(f"{repo_type}/{namespace}/{name}")
print(f" Downloads: {basic['downloads']:,} (7-day: {growth['total_downloads']:,})")
print(f" Likes: {basic['likes']:,}")
print(f" Trend: {growth['trend']} ({growth['growth_rate_percent']:+.1f}%)")
print()
except Exception as e:
print(f" Error: {e}\n")
print(f"{'='*70}")
print(f"Total Downloads: {total_downloads:,}")
print(f"Total Likes: {total_likes:,}")
print(f"{'='*70}\n")
# Usage
generate_weekly_report("myorg", TOKEN)
JavaScript/TypeScript Example
class StatsAPI {
constructor(baseURL, token = null) {
this.baseURL = baseURL;
this.headers = token ? { 'Authorization': `Bearer ${token}` } : {};
}
async getRepoStats(repoType, namespace, name) {
const response = await fetch(
`${this.baseURL}/${repoType}s/${namespace}/${name}/stats`,
{ headers: this.headers }
);
return await response.json();
}
async getRecentStats(repoType, namespace, name, days = 30) {
const response = await fetch(
`${this.baseURL}/${repoType}s/${namespace}/${name}/stats/recent?days=${days}`,
{ headers: this.headers }
);
return await response.json();
}
async getTrending(repoType = 'model', days = 7, limit = 20) {
const response = await fetch(
`${this.baseURL}/api/trending?repo_type=${repoType}&days=${days}&limit=${limit}`,
{ headers: this.headers }
);
return await response.json();
}
async analyzeGrowth(repoType, namespace, name, days = 30) {
const data = await this.getRecentStats(repoType, namespace, name, days);
const stats = data.stats;
if (!stats.length) return null;
const total = stats.reduce((sum, s) => sum + s.downloads, 0);
const avgDaily = total / stats.length;
const midPoint = Math.floor(stats.length / 2);
const firstHalf = stats.slice(0, midPoint)
.reduce((sum, s) => sum + s.downloads, 0);
const secondHalf = stats.slice(midPoint)
.reduce((sum, s) => sum + s.downloads, 0);
const growthRate = firstHalf > 0
? ((secondHalf - firstHalf) / firstHalf * 100)
: 0;
return {
totalDownloads: total,
avgDailyDownloads: avgDaily,
growthRatePercent: growthRate,
trend: secondHalf > firstHalf ? 'up' : secondHalf < firstHalf ? 'down' : 'stable'
};
}
}
// Usage
const statsAPI = new StatsAPI('http://localhost:28080', 'YOUR_TOKEN');
// Get basic stats
const stats = await statsAPI.getRepoStats('model', 'myorg', 'mymodel');
console.log(`Downloads: ${stats.downloads.toLocaleString()}`);
console.log(`Likes: ${stats.likes}`);
// Get trending
const trending = await statsAPI.getTrending('model', 7, 10);
console.log('\nTop Trending Models:');
trending.trending.forEach((repo, i) => {
console.log(`${i + 1}. ${repo.id}: ${repo.recent_downloads.toLocaleString()} downloads`);
});
// Analyze growth
const growth = await statsAPI.analyzeGrowth('model', 'myorg', 'mymodel', 30);
console.log(`\n30-day growth: ${growth.growthRatePercent.toFixed(1)}%`);
CLI Usage
See CLI Documentation for command-line interface:
# Get repository stats
kohub-cli stats get model myorg/mymodel
# Get recent statistics (last 30 days)
kohub-cli stats recent model myorg/mymodel --days 30
# Get trending models
kohub-cli trending --type model --days 7 --limit 10
# Export to CSV
kohub-cli stats export model myorg/mymodel --days 90 --output stats.csv
Download Session Tracking
How Sessions Work
- Session window: Downloads within a short time window (e.g., 30 minutes) count as one session
- User-based: Tracked by user ID (authenticated) or IP + User-Agent (anonymous)
- Repository-level: One session per repository, even if multiple files downloaded
- Daily aggregation: Sessions are aggregated daily for historical analysis
What Counts as a Download?
- ✅ Successful file downloads (HTTP 200)
- ✅ Git clone operations
- ✅ LFS file downloads
- ❌ Failed downloads (HTTP 4xx/5xx)
- ❌ HEAD requests (metadata only)
- ❌ Tree browsing (no files downloaded)
Next Steps
- Quota Management API - Monitor storage usage
- Repository API - Repository management
- Likes API - Like/unlike repositories
- File Tree API - Browse repository contents