26 KiB
Kohaku Hub API Documentation
Last Updated: January 2025
This document explains how Kohaku Hub's API works, the data flow, and all available endpoints.
System Architecture
graph TB
subgraph Client["Client Layer"]
CLT["Client<br/>(huggingface_hub, git, browser)"]
end
subgraph Entry["Entry Point"]
NGX["Nginx (Port 28080)<br/>- Serves static files<br/>- Reverse proxy"]
end
subgraph App["Application Layer"]
API["FastAPI (Port 48888)<br/>- Auth & Permissions<br/>- HF-compatible API<br/>- Git Smart HTTP"]
end
subgraph Storage["Storage Backend"]
LFS["LakeFS<br/>- Git-like versioning<br/>- Branch management<br/>- Commit history"]
DB["PostgreSQL/SQLite<br/>- User data<br/>- Metadata<br/>- Deduplication<br/>- Synchronous with db.atomic()"]
S3["MinIO/S3<br/>- Object storage<br/>- LFS files<br/>- Presigned URLs"]
end
CLT -->|HTTP/Git/LFS| NGX
NGX -->|Static files| CLT
NGX -->|/api, /org, resolve| API
API -->|REST API async| LFS
API -->|Sync queries with db.atomic| DB
API -->|Async| S3
LFS -->|Stores objects| S3
Core Concepts
File Size Thresholds
graph TD
Start[File Upload] --> Check{File size > 10MB?}
Check -->|No| Regular[Regular Mode]
Check -->|Yes| LFS[LFS Mode]
Regular --> Base64[Base64 in commit payload]
LFS --> Presigned[S3 presigned URL]
Base64 --> FastAPI[FastAPI processes]
Presigned --> Direct[Direct S3 upload]
FastAPI --> LakeFS1[LakeFS stores object]
Direct --> Link[FastAPI links S3 object]
Link --> LakeFS2[LakeFS commit with physical address]
Note: The LFS threshold is configurable via KOHAKU_HUB_LFS_THRESHOLD_BYTES (default: 10MB = 10,000,000 bytes). Can also be set per-repository.
Storage Layout
S3 Bucket Structure:
s3://hub-storage/
│
├── hf-model-org-repo/ ← LakeFS managed repository
│ └── main/ ← Branch
│ ├── config.json
│ └── model.safetensors
│
└── lfs/ ← LFS objects (content-addressable)
└── ab/ ← First 2 chars of SHA256
└── cd/ ← Next 2 chars
└── abcd1234... ← Full SHA256 hash
Upload Workflow
Overview
sequenceDiagram
participant Client
participant API as FastAPI
participant LakeFS
participant S3
Note over Client,S3: Phase 1: Preupload Check
Client->>API: POST /preupload (file hashes & sizes)
API->>API: Check DB for existing SHA256
API-->>Client: Upload mode (regular/lfs) & dedup info
alt Small Files (<10MB)
Note over Client,S3: Phase 2a: Regular Upload
Client->>API: POST /commit (base64 content)
API->>LakeFS: Upload object
LakeFS->>S3: Store object
else Large Files (>=10MB)
Note over Client,S3: Phase 2b: LFS Upload
Client->>API: POST /info/lfs/objects/batch
API->>S3: Generate presigned URL
API-->>Client: Presigned URL
Client->>S3: PUT file (direct upload)
Client->>API: POST /commit (lfsFile entry)
API->>LakeFS: Link physical address
end
Note over Client,S3: Phase 3: Commit
API->>LakeFS: Commit with message
LakeFS-->>API: Commit ID
API-->>Client: Commit URL & OID
Step 1: Preupload Check
Purpose: Determine upload mode and check for duplicates
Endpoint: POST /api/{repo_type}s/{repo_id}/preupload/{revision}
Request:
{
"files": [
{
"path": "config.json",
"size": 1024,
"sha256": "abc123..."
},
{
"path": "model.bin",
"size": 52428800,
"sha256": "def456..."
}
]
}
Response:
{
"files": [
{
"path": "config.json",
"uploadMode": "regular",
"shouldIgnore": false
},
{
"path": "model.bin",
"uploadMode": "lfs",
"shouldIgnore": true // Already exists!
}
]
}
Step 2: Commit
Purpose: Atomically commit all changes to the repository
Endpoint: POST /api/{repo_type}s/{repo_id}/commit/{revision}
Format: NDJSON (Newline-Delimited JSON)
Example Payload:
{"key":"header","value":{"summary":"Add model files","description":"Initial upload"}}
{"key":"file","value":{"path":"config.json","content":"eyJtb2RlbCI6...","encoding":"base64"}}
{"key":"lfsFile","value":{"path":"model.bin","algo":"sha256","oid":"abc123...","size":52428800}}
{"key":"deletedFile","value":{"path":"old_config.json"}}
Operation Types:
| Key | Description | Usage |
|---|---|---|
header |
Commit metadata | Required, must be first line |
file |
Small file (inline base64) | For files ≤ 10MB |
lfsFile |
Large file (LFS reference) | For files > 10MB, already uploaded to S3 |
deletedFile |
Delete a single file | Remove file from repo |
deletedFolder |
Delete folder recursively | Remove all files in folder |
copyFile |
Copy file within repo | Duplicate file (deduplication-aware) |
Download Workflow
sequenceDiagram
participant Client
participant API as FastAPI
participant LakeFS
participant S3
Note over Client,S3: Optional: HEAD request for metadata
Client->>API: HEAD /resolve/{revision}/{filename}
API->>LakeFS: Stat object
LakeFS-->>API: Object metadata (SHA256, size)
API-->>Client: Headers (ETag, Content-Length, X-Repo-Commit)
Note over Client,S3: Download: GET request
Client->>API: GET /resolve/{revision}/{filename}
API->>LakeFS: Get object metadata
API->>S3: Generate presigned URL
API-->>Client: 302 Redirect (presigned URL)
Client->>S3: Direct download
S3-->>Client: File content
Note over Client: No proxy - direct S3 download
Database Schema
erDiagram
USER ||--o{ REPOSITORY : owns
USER ||--o{ SESSION : has
USER ||--o{ TOKEN : has
USER ||--o{ SSHKEY : has
USER }o--o{ USER : member_of
USER ||--o{ REPOSITORY_LIKE : likes
USER ||--o{ DOWNLOAD_SESSION : downloads
REPOSITORY ||--o{ FILE : contains
REPOSITORY ||--o{ COMMIT : has
REPOSITORY ||--o{ STAGING_UPLOAD : has
REPOSITORY ||--o{ REPOSITORY_LIKE : liked_by
REPOSITORY ||--o{ DOWNLOAD_SESSION : tracked
REPOSITORY ||--o{ DAILY_REPO_STATS : has_stats
COMMIT ||--o{ LFS_OBJECT_HISTORY : references
USER {
int id PK
string username UK
string normalized_name UK
boolean is_org
string email UK
string password_hash
boolean email_verified
boolean is_active
bigint private_quota_bytes
bigint public_quota_bytes
bigint private_used_bytes
bigint public_used_bytes
string full_name
text bio
blob avatar
datetime avatar_updated_at
datetime created_at
}
REPOSITORY {
int id PK
string repo_type
string namespace
string name
string full_id
boolean private
int owner_id FK
bigint quota_bytes
bigint used_bytes
int lfs_threshold_bytes
int lfs_keep_versions
text lfs_suffix_rules
int downloads
int likes_count
datetime created_at
}
FILE {
int id PK
int repository_id FK
string path_in_repo
int size
string sha256
boolean lfs
boolean is_deleted
int owner_id FK
datetime created_at
datetime updated_at
}
COMMIT {
int id PK
string commit_id
int repository_id FK
string repo_type
string branch
int author_id FK
int owner_id FK
string username
text message
text description
datetime created_at
}
TOKEN {
int id PK
int user_id FK
string token_hash UK
string name
datetime last_used
datetime created_at
}
SESSION {
int id PK
string session_id UK
int user_id FK
string secret
datetime expires_at
datetime created_at
}
SSHKEY {
int id PK
int user_id FK
string key_type
text public_key
string fingerprint UK
string title
datetime last_used
datetime created_at
}
STAGING_UPLOAD {
int id PK
int repository_id FK
string repo_type
string revision
string path_in_repo
string sha256
int size
string upload_id
string storage_key
boolean lfs
int uploader_id FK
datetime created_at
}
LFS_OBJECT_HISTORY {
int id PK
int repository_id FK
string path_in_repo
string sha256
int size
string commit_id
int file_id FK
datetime created_at
}
REPOSITORY_LIKE {
int id PK
int repository_id FK
int user_id FK
datetime created_at
}
DOWNLOAD_SESSION {
int id PK
int repository_id FK
int user_id FK
string session_id
int time_bucket
int file_count
string first_file
datetime first_download_at
datetime last_download_at
}
DAILY_REPO_STATS {
int id PK
int repository_id FK
date date
int download_sessions
int authenticated_downloads
int anonymous_downloads
int total_files
datetime created_at
}
API Endpoint Summary
Repository Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/repos/create |
POST | ✓ | Create new repository |
/api/repos/delete |
DELETE | ✓ | Delete repository |
/api/repos/move |
POST | ✓ | Move/rename repository |
/api/{type}s |
GET | ○ | List repositories (respects privacy) |
/api/{type}s/{id} |
GET | ○ | Get repo info |
/api/{type}s/{id}/tree/{rev}/{path} |
GET | ○ | List files |
/api/{type}s/{id}/revision/{rev} |
GET | ○ | Get revision info |
/api/{type}s/{id}/paths-info/{rev} |
POST | ○ | Get info for specific paths |
/api/users/{username}/repos |
GET | ○ | List all repos for a user/org (grouped by type) |
File Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/{type}s/{id}/preupload/{rev} |
POST | ✓ | Check before upload |
/api/{type}s/{id}/commit/{rev} |
POST | ✓ | Atomic commit |
/{id}/resolve/{rev}/{file} |
GET | ○ | Download file |
/{id}/resolve/{rev}/{file} |
HEAD | ○ | Get file metadata |
/{type}s/{id}/resolve/{rev}/{file} |
GET | ○ | Download file (with type) |
/{type}s/{id}/resolve/{rev}/{file} |
HEAD | ○ | Get file metadata (with type) |
LFS Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/{id}.git/info/lfs/objects/batch |
POST | ✓ | LFS batch API |
/api/{id}.git/info/lfs/verify |
POST | ✓ | Verify upload |
Commit History
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/{type}s/{namespace}/{name}/commits/{branch} |
GET | ○ | List commits on a branch with pagination |
Branch and Tag Management
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/{type}s/{namespace}/{name}/branch |
POST | ✓ | Create a new branch |
/{type}s/{namespace}/{name}/branch/{branch} |
DELETE | ✓ | Delete a branch |
/{type}s/{namespace}/{name}/tag |
POST | ✓ | Create a new tag |
/{type}s/{namespace}/{name}/tag/{tag} |
DELETE | ✓ | Delete a tag |
Settings Management
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/users/{username}/settings |
PUT | ✓ | Update user settings |
/api/organizations/{org_name}/settings |
PUT | ✓ | Update organization settings |
/{type}s/{namespace}/{name}/settings |
PUT | ✓ | Update repository settings (private, gated, LFS settings) |
/api/{type}s/{namespace}/{name}/lfs/settings |
GET | ○ | Get repository LFS settings |
Social Features
Likes:
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/{type}s/{namespace}/{name}/like |
POST | ✓ | Like a repository |
/api/{type}s/{namespace}/{name}/like |
DELETE | ✓ | Unlike a repository |
/api/{type}s/{namespace}/{name}/like |
GET | ○ | Check if current user liked repository |
/api/{type}s/{namespace}/{name}/likers |
GET | ○ | List users who liked repository |
/api/users/{username}/likes |
GET | ○ | List repositories user has liked |
Statistics & Trending:
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/{type}s/{namespace}/{name}/stats |
GET | ○ | Get repository statistics (downloads, likes) |
/api/{type}s/{namespace}/{name}/stats/recent |
GET | ○ | Get recent download statistics (time series) |
/api/trending |
GET | ○ | Get trending repositories |
Avatars:
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/users/{username}/avatar |
POST | ✓ | Upload user avatar |
/api/users/{username}/avatar |
GET | ○ | Get user avatar image |
/api/users/{username}/avatar |
DELETE | ✓ | Delete user avatar |
/api/organizations/{org_name}/avatar |
POST | ✓ | Upload organization avatar |
/api/organizations/{org_name}/avatar |
GET | ○ | Get organization avatar image |
/api/organizations/{org_name}/avatar |
DELETE | ✓ | Delete organization avatar |
Quota Management
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/quota/{namespace} |
GET | ✓ | Get namespace quota information |
/api/quota/{namespace} |
PUT | ✓ | Set namespace quota |
/api/quota/{namespace}/recalculate |
POST | ✓ | Recalculate namespace storage usage |
/api/quota/{namespace}/public |
GET | ○ | Get public quota info (permission-based) |
/api/quota/{namespace}/repos |
GET | ✓ | List namespace repositories with storage breakdown |
/api/quota/repo/{type}/{namespace}/{name} |
GET | ○ | Get repository quota information |
/api/quota/repo/{type}/{namespace}/{name} |
PUT | ✓ | Set repository quota |
/api/quota/repo/{type}/{namespace}/{name}/recalculate |
POST | ✓ | Recalculate repository storage |
Invitations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/invitations/org/{org_name}/create |
POST | ✓ | Create organization invitation |
/api/invitations/{token} |
GET | ✗ | Get invitation details |
/api/invitations/{token}/accept |
POST | ✓ | Accept invitation |
/api/invitations/{token} |
DELETE | ✓ | Delete/cancel invitation |
/api/invitations/org/{org_name}/list |
GET | ✓ | List organization invitations |
SSH Keys
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/user/keys |
GET | ✓ | List user's SSH keys |
/api/user/keys |
POST | ✓ | Add new SSH key |
/api/user/keys/{key_id} |
GET | ✓ | Get SSH key details |
/api/user/keys/{key_id} |
DELETE | ✓ | Delete SSH key |
Validation
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/validate/check-name |
POST | ✗ | Check if username/org/repo name is available |
/api/validate-yaml |
POST | ✗ | Validate YAML content |
Authentication Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/auth/register |
POST | ✗ | Register new user |
/api/auth/login |
POST | ✗ | Login and create session |
/api/auth/logout |
POST | ✓ | Logout and destroy session |
/api/auth/verify-email |
GET | ✗ | Verify email with token |
/api/auth/me |
GET | ✓ | Get current user info |
/api/auth/tokens |
GET | ✓ | List user's API tokens |
/api/auth/tokens/create |
POST | ✓ | Create new API token |
/api/auth/tokens/{token_id} |
DELETE | ✓ | Revoke API token |
Organization Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/org/create |
POST | ✓ | Create new organization |
/org/{org_name} |
GET | ✗ | Get organization details |
/org/{org_name}/members |
GET | ○ | List organization members |
/org/{org_name}/members |
POST | ✓ | Add member to organization |
/org/{org_name}/members/{username} |
DELETE | ✓ | Remove member from organization |
/org/{org_name}/members/{username} |
PUT | ✓ | Update member role |
/org/users/{username}/orgs |
GET | ✗ | List user's organizations |
Git Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/{namespace}/{name}.git/info/refs |
GET | ○ | Git service advertisement |
/{namespace}/{name}.git/HEAD |
GET | ○ | Get HEAD reference |
/{namespace}/{name}.git/git-upload-pack |
POST | ○ | Clone/fetch/pull |
/{namespace}/{name}.git/git-receive-pack |
POST | ✓ | Push (in development) |
Utility Operations
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/api/whoami-v2 |
GET | ✓ | Get detailed current user info |
/api/version |
GET | ✗ | Get API version information |
/health |
GET | ✗ | Health check |
/ |
GET | ✗ | API information |
Auth Legend:
- ✓ = Required
- ○ = Optional (public repos)
- ✗ = Not required
New Features Documentation
Repository Likes
Like a repository:
POST /api/models/org/model/like
Authorization: Bearer YOUR_TOKEN
Response:
{
"success": true,
"message": "Repository liked successfully",
"likes_count": 42
}
Check if liked:
GET /api/models/org/model/like
Response:
{
"liked": true
}
List likers:
GET /api/models/org/model/likers?limit=50
Response:
{
"likers": [
{
"username": "alice",
"full_name": "Alice Developer"
}
],
"total": 42
}
Statistics and Trending
Get repository stats:
GET /api/models/org/model/stats
Response:
{
"downloads": 1234,
"likes": 42
}
Get recent statistics (time series):
GET /api/models/org/model/stats/recent?days=30
Response:
{
"stats": [
{
"date": "2025-01-15",
"downloads": 45,
"authenticated": 30,
"anonymous": 15,
"files": 120
}
],
"period": {
"start": "2024-12-16",
"end": "2025-01-15",
"days": 30
}
}
Get trending repositories:
GET /api/trending?repo_type=model&days=7&limit=20
Response:
{
"trending": [
{
"id": "org/hot-model",
"type": "model",
"downloads": 5000,
"likes": 200,
"recent_downloads": 1500,
"private": false
}
],
"period": {
"start": "2025-01-08",
"end": "2025-01-15",
"days": 7
}
}
Avatar Management
Upload avatar:
POST /api/users/alice/avatar
Authorization: Bearer YOUR_TOKEN
Content-Type: multipart/form-data
file: [image binary data]
Features:
- Accepts JPEG, PNG, WebP, GIF
- Maximum input size: 10MB
- Automatically resizes to fit 1024x1024
- Center crops to square
- Converts to JPEG format
- Output quality: 95%
Response:
{
"success": true,
"message": "Avatar uploaded successfully",
"size_bytes": 245678
}
Get avatar:
GET /api/users/alice/avatar
Returns JPEG image with cache headers.
Quota Management
Get quota information:
GET /api/quota/alice
Authorization: Bearer YOUR_TOKEN
Response:
{
"namespace": "alice",
"is_organization": false,
"quota_bytes": 10737418240,
"used_bytes": 1234567890,
"available_bytes": 9502850350,
"percentage_used": 11.5
}
Set quota:
PUT /api/quota/alice
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json
{
"quota_bytes": 10737418240
}
Repository-specific quota:
GET /api/quota/repo/model/org/my-model
Response:
{
"repo_id": "org/my-model",
"repo_type": "model",
"namespace": "org",
"quota_bytes": 1073741824,
"used_bytes": 524288000,
"available_bytes": 549453824,
"percentage_used": 48.8,
"effective_quota_bytes": 1073741824,
"namespace_quota_bytes": 10737418240,
"namespace_used_bytes": 5368709120,
"namespace_available_bytes": 5368709120,
"is_inheriting": false
}
Storage breakdown for namespace:
GET /api/quota/org/repos
Authorization: Bearer YOUR_TOKEN
Response:
{
"namespace": "org",
"is_organization": true,
"total_repos": 15,
"repositories": [
{
"repo_id": "org/large-model",
"repo_type": "model",
"name": "large-model",
"private": false,
"quota_bytes": null,
"used_bytes": 5368709120,
"percentage_used": 50.0,
"is_inheriting": true,
"created_at": "2025-01-01T00:00:00Z"
}
]
}
Invitations
Create organization invitation:
POST /api/invitations/org/my-org/create
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json
{
"email": "newuser@example.com",
"role": "member",
"max_usage": null,
"expires_days": 7
}
Response:
{
"success": true,
"token": "abc123...",
"invitation_link": "http://hub.example.com/invite/abc123...",
"expires_at": "2025-01-22T12:00:00Z",
"max_usage": null,
"is_reusable": false
}
Reusable invitation (10 uses):
{
"role": "member",
"max_usage": 10,
"expires_days": 30
}
Accept invitation:
POST /api/invitations/{token}/accept
Authorization: Bearer YOUR_TOKEN
SSH Keys
Add SSH key:
POST /api/user/keys
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json
{
"title": "My Laptop",
"key": "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIB... user@host"
}
Response:
{
"id": 42,
"title": "My Laptop",
"key_type": "ssh-ed25519",
"fingerprint": "SHA256:abc123...",
"created_at": "2025-01-15T12:00:00.000000Z",
"last_used": null
}
Supported key types:
ssh-rsassh-dssecdsa-sha2-nistp256ecdsa-sha2-nistp384ecdsa-sha2-nistp521ssh-ed25519
Name Validation
Check if name is available:
POST /api/validate/check-name
Content-Type: application/json
{
"name": "my-new-repo",
"namespace": "org",
"type": "model"
}
Response (available):
{
"available": true,
"normalized_name": "my_new_repo",
"conflict_with": null,
"message": "Repository name is available"
}
Response (conflict):
{
"available": false,
"normalized_name": "my_new_repo",
"conflict_with": "org/My-New-Repo",
"message": "Repository name conflicts with existing repository: My-New-Repo (case-insensitive)"
}
LFS Settings
Get repository LFS settings:
GET /api/models/org/model/lfs/settings
Response:
{
"lfs_threshold_bytes": 5000000,
"lfs_threshold_bytes_effective": 5000000,
"lfs_threshold_bytes_source": "repository",
"lfs_keep_versions": 10,
"lfs_keep_versions_effective": 10,
"lfs_keep_versions_source": "repository",
"lfs_suffix_rules": [".safetensors", ".bin"],
"lfs_suffix_rules_effective": [".safetensors", ".bin"],
"server_defaults": {
"lfs_threshold_bytes": 10000000,
"lfs_keep_versions": 5
}
}
Update repository settings with LFS:
PUT /models/org/model/settings
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json
{
"lfs_threshold_bytes": 5000000,
"lfs_keep_versions": 10,
"lfs_suffix_rules": [".safetensors", ".bin", ".gguf"]
}
Content Deduplication
Kohaku Hub implements content-addressable storage for LFS files:
Same file uploaded to different repos:
Repo A: myorg/model-v1
└─ model.bin (sha256: abc123...)
Repo B: myorg/model-v2
└─ model.bin (sha256: abc123...)
S3 Storage:
└─ lfs/ab/c1/abc123... ← SINGLE COPY
▲ ▲
│ │
Repo A Repo B
(linked) (linked)
Benefits:
- Save storage space
- Faster uploads (skip if exists)
- Efficient for model variants
Error Handling
Kohaku Hub uses HuggingFace-compatible error headers:
HTTP Response Headers:
X-Error-Code: RepoNotFound
X-Error-Message: Repository 'org/repo' not found
Error Codes:
| Code | HTTP Status | Description |
|---|---|---|
RepoNotFound |
404 | Repository doesn't exist |
RepoExists |
400 | Repository already exists |
RevisionNotFound |
404 | Branch/commit not found |
EntryNotFound |
404 | File not found |
GatedRepo |
403 | Need permission |
BadRequest |
400 | Invalid request |
ServerError |
500 | Internal error |
These error codes are parsed by huggingface_hub client to raise appropriate Python exceptions.
Performance Considerations
Download Tracking
KohakuHub implements smart download tracking:
Session Deduplication:
- Downloads are grouped into 15-minute sessions
- Multiple files downloaded in the same session count as 1 download
- Uses session ID + time bucket for deduplication
Benefits:
- Accurate download counts (git clone = 1 download, not N file downloads)
- Trending calculations based on unique sessions
- Efficient storage (one record per session)
Recommended S3 Providers
| Provider | Best For | Pricing Model | Notes |
|---|---|---|---|
| Cloudflare R2 | High download | Free egress, $0.015/GB storage | Best for public datasets |
| Wasabi | Archive/backup | $6/TB/month, free egress* | *if download < storage |
| MinIO | Self-hosted | Free (your hardware/bandwidth) | Full control, privacy |
| AWS S3 | Enterprise | Pay per GB + egress | Most features, expensive egress |
| Backblaze B2 | Budget | $6/TB storage, $0.01/GB egress | Good for mixed workloads |
Recommendation for KohakuHub:
- Development: MinIO (included in docker-compose)
- Public Hub: Cloudflare R2 (free egress saves costs)
- Private/Enterprise: Self-hosted MinIO or AWS S3 with VPC endpoints