36 KiB
Admin Portal Guide
Complete guide to KohakuHub's administration interface
Last Updated: January 2025 Access: http://your-hub.com/admin
Table of Contents
- Overview
- Authentication
- Dashboard
- User Management
- Repository Management
- Commit History Viewer
- S3 Storage Browser
- Quota Management
- Invitation Management
- API Reference
- Security Best Practices
Overview
The Admin Portal provides a centralized interface for managing your KohakuHub instance. It offers:
- User Management - Create, view, and delete users
- Repository Browser - View all repositories with statistics
- Commit History - Track commits across all repositories
- Storage Browser - Browse S3 buckets and objects
- Quota Management - Set and monitor storage quotas
- Invitation Management - Create and manage registration invitations
- Statistics Dashboard - Real-time insights into usage
- Bulk Operations - Recalculate storage for all repositories
Access URL:
http://your-hub.com/admin
Authentication
Admin Token
The admin portal requires a secret token configured in your environment:
Configuration:
# docker-compose.yml
environment:
KOHAKU_HUB_ADMIN_ENABLED: "true"
KOHAKU_HUB_ADMIN_SECRET_TOKEN: "your-secret-token-here" # CHANGE THIS!
Security:
- ⚠️ NEVER use default token
"change-me-in-production"in production - ✅ Generate strong random token:
openssl rand -hex 32 - ✅ Store securely (environment variable, secrets manager)
- ✅ Rotate regularly
- ✅ Use HTTPS in production
Login
- Navigate to
/admin - Enter your admin secret token
- Token is stored in browser session (not localStorage for security)
- Auto-logout on browser close
Example:
# Generate secure token
openssl rand -hex 32
# Output: a1b2c3d4e5f6...
# Add to docker-compose.yml
KOHAKU_HUB_ADMIN_SECRET_TOKEN: "a1b2c3d4e5f6..."
# Restart
docker-compose up -d
Dashboard
Overview Statistics
The dashboard shows real-time statistics from your database:
User Stats:
- Total users
- Active users
- Email verified users
- Inactive users
Organization Stats:
- Total organizations
Repository Stats:
- Total repositories
- Private vs public repositories
- Breakdown by type (models, datasets, spaces)
Commit Stats:
- Total commits
- Top contributors (by commit count)
Storage Stats:
- Total storage used (private + public)
- Private vs public storage
- LFS object count and size
Quick Actions:
- Navigate to user management
- Browse repositories
- View commits
- Inspect S3 storage
- Manage quotas
- Manage invitations
User Management
List Users
Features:
- View all users with pagination
- Sort by ID, username, storage usage
- Filter and search
- Storage quota visualization
Columns:
- ID, Username, Email
- Private storage (used/quota)
- Public storage (used/quota)
- Total storage
- Email verification status
- Active status
- Created date
Create User
Fields:
- Username (required, unique)
- Email (required, unique)
- Password (required)
- Email verified (checkbox)
- Is active (checkbox)
- Private quota (bytes, optional = unlimited)
- Public quota (bytes, optional = unlimited)
Example:
Username: alice
Email: alice@example.com
Password: ********
Email Verified: ✓
Is Active: ✓
Private Quota: 10737418240 (10 GB)
Public Quota: 53687091200 (50 GB)
API Endpoint:
curl -X POST http://localhost:48888/admin/api/users \
-H "X-Admin-Token: your-secret-token" \
-H "Content-Type: application/json" \
-d '{
"username": "alice",
"email": "alice@example.com",
"password": "secure_password",
"email_verified": true,
"is_active": true,
"private_quota_bytes": 10737418240,
"public_quota_bytes": 53687091200
}'
View User Details
Click "View" to see:
- User ID, username, email
- Verification and active status
- Storage quotas (private, public)
- Storage used (private, public)
- Created date
Actions:
- Manage Quota (navigate to quota page)
Delete User
Normal Delete:
- Deletes user account
- Deletes all sessions and tokens
- Deletes organization memberships
- Keeps repositories (must delete separately)
Force Delete:
- Deletes everything above
- Also deletes all owned repositories
- ⚠️ Cannot be undone!
Workflow:
- Click "Delete" → Confirmation dialog
- If user owns repos → Shows repo list
- Choose: Cancel or Force Delete
- Confirm force delete → All data deleted
API Endpoint:
# Normal delete (fails if user owns repos)
curl -X DELETE http://localhost:48888/admin/api/users/alice \
-H "X-Admin-Token: your-secret-token"
# Force delete (deletes user and all their repos)
curl -X DELETE "http://localhost:48888/admin/api/users/alice?force=true" \
-H "X-Admin-Token: your-secret-token"
Toggle Email Verification
Use case: Manually verify users when email verification is disabled or failed.
Action: Click "Verify" or "Unverify" button → Instant update
API Endpoint:
curl -X PATCH http://localhost:48888/admin/api/users/alice/email-verification?verified=true \
-H "X-Admin-Token: your-secret-token"
Repository Management
List Repositories
Filters:
- Repository type (model/dataset/space)
- Namespace (user or organization)
Columns:
- ID
- Type (color-coded badge)
- Full repository ID (namespace/name)
- Privacy status (Private/Public badge)
- Owner username
- Storage quota and usage
- Created date
Actions:
- View Details → Opens detailed dialog
Repository Details
Information:
- ID, Type, Full ID
- Namespace, Name
- Owner username
- Privacy status
- Created date
- File count (from database, active files only)
- Commit count (from database)
- Total size (sum of all active files)
- Quota information (quota, used, percentage, inheriting status)
Actions:
- View in Main App → Opens repository in main UI
API Endpoint:
curl http://localhost:48888/admin/api/repositories/model/org/my-model \
-H "X-Admin-Token: your-secret-token"
Commit History Viewer
Overview
View all commits across all repositories in your instance.
Filters:
- Repository ID (e.g., "org/repo-name")
- Author username
Columns:
- Commit ID (first 8 chars)
- Repository (type badge + full ID)
- Branch
- Author
- Message (truncated, hover for full)
- Created date
Sorting:
- Sort by ID, created date, username, repository
Pagination:
- Page size: 10, 20, 50, 100
- Navigate through pages
API Endpoint:
# List all commits
curl http://localhost:48888/admin/api/commits?limit=100 \
-H "X-Admin-Token: your-secret-token"
# Filter by repository
curl "http://localhost:48888/admin/api/commits?repo_full_id=org/model&limit=50" \
-H "X-Admin-Token: your-secret-token"
# Filter by author
curl "http://localhost:48888/admin/api/commits?username=alice&limit=50" \
-H "X-Admin-Token: your-secret-token"
Use Cases
- Track user activity
- Find specific commits
- Monitor repository changes
- Debug commit issues
- Audit trail
S3 Storage Browser
Bucket List
Overview:
- View all S3 buckets
- Total size and object count
- Visual progress bars
- Creation dates
Metrics:
- Bucket name
- Total size (formatted: KB, MB, GB, TB)
- Object count
- Creation date
- Progress bar (relative to 100GB)
Actions:
- Click bucket → Browse contents
API Endpoint:
curl http://localhost:48888/admin/api/storage/buckets \
-H "X-Admin-Token: your-secret-token"
Response:
{
"buckets": [
{
"name": "hub-storage",
"creation_date": "2025-01-01T00:00:00Z",
"total_size": 107374182400,
"object_count": 5000
}
]
}
Object Browser
Features:
- List objects in selected bucket
- Filter by prefix (e.g., "lfs/", "models/")
- Pagination (up to 1000 objects)
Columns:
- Key (full S3 path)
- Size
- Storage class (STANDARD, etc.)
- Last modified date
Prefix Filtering:
Enter prefix: lfs/
→ Shows only objects starting with "lfs/"
Enter prefix: hf-model-org-repo/
→ Shows objects for specific repository
API Endpoint:
# List objects in bucket
curl "http://localhost:48888/admin/api/storage/objects/hub-storage?prefix=lfs/&limit=100" \
-H "X-Admin-Token: your-secret-token"
Quota Management
View Quota
Per-user or per-organization:
- Private quota (limit)
- Private used
- Public quota (limit)
- Public used
- Total usage
- Usage percentages
API Endpoint:
# Get user quota
curl "http://localhost:48888/admin/api/quota/alice?is_org=false" \
-H "X-Admin-Token: your-secret-token"
# Get organization quota
curl "http://localhost:48888/admin/api/quota/my-org?is_org=true" \
-H "X-Admin-Token: your-secret-token"
Response:
{
"namespace": "alice",
"is_organization": false,
"private_quota_bytes": 10737418240,
"public_quota_bytes": 53687091200,
"private_used_bytes": 1234567890,
"public_used_bytes": 5678901234,
"private_available_bytes": 9502850350,
"public_available_bytes": 47008189966,
"private_percentage_used": 11.5,
"public_percentage_used": 10.6,
"total_used_bytes": 6913469124
}
Set Quota
Fields:
- Private quota bytes (null = unlimited)
- Public quota bytes (null = unlimited)
Examples:
10 GB = 10737418240 bytes
50 GB = 53687091200 bytes
Unlimited = (empty/null)
API Endpoint:
curl -X PUT http://localhost:48888/admin/api/quota/alice \
-H "X-Admin-Token: your-secret-token" \
-H "Content-Type: application/json" \
-d '{
"private_quota_bytes": 10737418240,
"public_quota_bytes": 53687091200
}'
Recalculate Storage
Purpose: Re-scan all files and update storage usage.
When to use:
- Database out of sync
- After manual S3 operations
- Quota shows incorrect values
Process:
- Scans all files for namespace
- Sums file sizes (private and public separately)
- Updates User/Organization table
API Endpoint:
curl -X POST "http://localhost:48888/admin/api/quota/alice/recalculate?is_org=false" \
-H "X-Admin-Token: your-secret-token"
Bulk Storage Recalculation
NEW: Recalculate storage for all repositories at once.
API Endpoint:
# Recalculate all repositories
curl -X POST http://localhost:48888/admin/api/repositories/recalculate-all \
-H "X-Admin-Token: your-secret-token"
# Filter by type
curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?repo_type=model" \
-H "X-Admin-Token: your-secret-token"
# Filter by namespace
curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?namespace=org" \
-H "X-Admin-Token: your-secret-token"
Response:
{
"total": 250,
"success_count": 248,
"failure_count": 2,
"failures": [
{
"repo_id": "org/problem-repo",
"error": "Repository not found in LakeFS"
}
],
"message": "Recalculated storage for 248/250 repositories"
}
Invitation Management
Create Registration Invitation
Purpose: Generate invitations for user registration (useful for invite-only mode).
Features:
- Optional organization membership after registration
- Reusable invitations with usage limits
- Configurable expiration
API Endpoint:
curl -X POST http://localhost:48888/admin/api/invitations/register \
-H "X-Admin-Token: your-secret-token" \
-H "Content-Type: application/json" \
-d '{
"org_id": null,
"role": "member",
"max_usage": 10,
"expires_days": 30
}'
Response:
{
"success": true,
"token": "abc123xyz...",
"invitation_link": "http://your-hub.com/register?invitation=abc123xyz...",
"expires_at": "2025-02-14T12:00:00Z",
"max_usage": 10,
"is_reusable": true,
"action": "register_account"
}
Invitation Types:
- One-time:
max_usage: null- Single use invitation - Limited:
max_usage: 10- Can be used 10 times - Unlimited:
max_usage: -1- Unlimited uses
Auto-join Organization:
{
"org_id": 5,
"role": "member",
"max_usage": 50,
"expires_days": 90
}
Users who register with this invitation will automatically join the organization as members.
List All Invitations
API Endpoint:
# List all invitations
curl http://localhost:48888/admin/api/invitations \
-H "X-Admin-Token: your-secret-token"
# Filter by action type
curl "http://localhost:48888/admin/api/invitations?action=register_account" \
-H "X-Admin-Token: your-secret-token"
Response:
{
"invitations": [
{
"id": 1,
"token": "abc123...",
"action": "register_account",
"org_id": null,
"org_name": null,
"role": null,
"email": null,
"created_by": 1,
"creator_username": "System",
"created_at": "2025-01-15T12:00:00Z",
"expires_at": "2025-02-15T12:00:00Z",
"max_usage": 10,
"usage_count": 5,
"is_reusable": true,
"is_available": true,
"error_message": null,
"used_at": null,
"used_by": null
}
],
"limit": 100,
"offset": 0
}
Delete Invitation
API Endpoint:
curl -X DELETE http://localhost:48888/admin/api/invitations/{token} \
-H "X-Admin-Token: your-secret-token"
API Reference
Authentication
All admin API endpoints require X-Admin-Token header:
curl -H "X-Admin-Token: your-secret-token" \
http://localhost:48888/admin/api/stats
Endpoints Overview
User Management:
GET /admin/api/users # List users
GET /admin/api/users/{username} # Get user info
POST /admin/api/users # Create user
DELETE /admin/api/users/{username} # Delete user
PATCH /admin/api/users/{username}/email-verification # Set verification
Repository Management:
GET /admin/api/repositories # List repositories
GET /admin/api/repositories/{type}/{namespace}/{name} # Get details
POST /admin/api/repositories/recalculate-all # Bulk storage recalc
Commit History:
GET /admin/api/commits # List commits
Storage:
GET /admin/api/storage/buckets # List buckets
GET /admin/api/storage/objects/{bucket} # List objects
Statistics:
GET /admin/api/stats # Basic stats
GET /admin/api/stats/detailed # Detailed stats
GET /admin/api/stats/timeseries?days=30 # Time-series data
GET /admin/api/stats/top-repos?by=commits # Top repositories
Quota:
GET /admin/api/quota/{namespace} # Get quota
PUT /admin/api/quota/{namespace} # Set quota
POST /admin/api/quota/{namespace}/recalculate # Recalculate
Invitations:
POST /admin/api/invitations/register # Create registration invitation
GET /admin/api/invitations # List all invitations
DELETE /admin/api/invitations/{token} # Delete invitation
Response Formats
User Info:
{
"id": 1,
"username": "alice",
"email": "alice@example.com",
"email_verified": true,
"is_active": true,
"private_quota_bytes": 10737418240,
"public_quota_bytes": 53687091200,
"private_used_bytes": 1234567,
"public_used_bytes": 9876543,
"created_at": "2025-01-01T00:00:00.000000Z"
}
Repository Info:
{
"id": 42,
"repo_type": "model",
"namespace": "org",
"name": "my-model",
"full_id": "org/my-model",
"private": false,
"owner_id": 1,
"owner_username": "alice",
"created_at": "2025-01-01T00:00:00.000000Z",
"file_count": 15,
"commit_count": 8,
"total_size": 12345678,
"quota_bytes": null,
"used_bytes": 12345678,
"percentage_used": 0.12,
"is_inheriting": true
}
Detailed Stats:
{
"users": {
"total": 100,
"active": 95,
"verified": 80,
"inactive": 5
},
"organizations": {
"total": 10
},
"repositories": {
"total": 250,
"private": 100,
"public": 150,
"by_type": {
"model": 180,
"dataset": 60,
"space": 10
}
},
"commits": {
"total": 1500,
"top_contributors": [
{"username": "alice", "commit_count": 150},
{"username": "bob", "commit_count": 120}
]
},
"lfs": {
"total_objects": 500,
"total_size": 107374182400
},
"storage": {
"private_used": 10737418240,
"public_used": 53687091200,
"total_used": 64424509440
}
}
Security Best Practices
Token Management
DO:
- ✅ Generate cryptographically random tokens
- ✅ Use environment variables (never hardcode)
- ✅ Rotate tokens regularly (monthly)
- ✅ Use HTTPS in production
- ✅ Restrict admin portal access (firewall, VPN)
DON'T:
- ❌ Use default token in production
- ❌ Commit tokens to git
- ❌ Share tokens via insecure channels
- ❌ Use same token across environments
- ❌ Store tokens in browser localStorage
Token Rotation
# 1. Generate new token
NEW_TOKEN=$(openssl rand -hex 32)
# 2. Update docker-compose.yml
KOHAKU_HUB_ADMIN_SECRET_TOKEN: "$NEW_TOKEN"
# 3. Restart services
docker-compose up -d
# 4. Update saved tokens in admin portal sessions
Network Security
Production Deployment:
# Restrict admin portal to specific IPs
location /admin {
allow 192.168.1.0/24; # Internal network
allow 10.0.0.0/8; # VPN
deny all;
# ... rest of config
}
Alternative: Basic Auth Layer
location /admin/api/ {
auth_basic "Admin Area";
auth_basic_user_file /etc/nginx/.htpasswd;
# Then require X-Admin-Token header
proxy_pass http://hub-api:48888;
}
Audit Logging
Admin operations are logged with [ADMIN] prefix:
[WARNING] [ADMIN] [07:05:55] Admin deleted user: testuser (deleted 5 repositories)
[INFO] [ADMIN] [07:06:12] Admin set quota for user alice: private=10737418240, public=53687091200
[WARNING] [ADMIN] [07:06:45] Admin created registration invitation (max_usage=10, expires=30d)
Monitor logs:
docker logs khub-hub-api | grep "\[ADMIN\]"
Use Cases
Scenario 1: New User Onboarding
1. Dashboard → Quick Actions → "Manage Users"
2. Click "Create User"
3. Fill form:
- Username: newuser
- Email: newuser@company.com
- Password: (generate secure password)
- Email Verified: ✓
- Quotas: 10GB private, 50GB public
4. Click "Create User"
5. Share credentials with user
Scenario 2: Invite-Only Registration Mode
1. Dashboard → "Manage Invitations"
2. Click "Create Registration Invitation"
3. Configure:
- Max Usage: 50 (for team)
- Expires: 90 days
- Auto-join Organization: my-company (as member)
4. Copy invitation link
5. Share link with team members
6. Monitor usage count
Scenario 3: Storage Cleanup
1. Dashboard → "Browse Storage"
2. Click on "hub-storage" bucket
3. Filter by prefix: "lfs/"
4. Review large objects
5. Identify unused LFS objects
6. (Manually delete via CLI/API if needed)
Scenario 4: User Investigation
1. Dashboard → "View Commits"
2. Filter by username: "suspicious-user"
3. Review commit activity
4. Click repository links to inspect content
5. If needed: Go to Users → Delete user (with force)
Scenario 5: Quota Enforcement
1. Dashboard → "Manage Quotas"
2. Select namespace (user or org)
3. View current usage
4. Set new limits if exceeded
5. Click "Recalculate" to verify
6. Monitor dashboard for compliance
Scenario 6: System Maintenance
1. Dashboard → "Bulk Operations"
2. Click "Recalculate All Repository Storage"
3. Optional: Filter by type or namespace
4. Confirm operation
5. Wait for completion (progress logged)
6. Review success/failure report
Troubleshooting
Can't Login
Problem: Invalid admin token
Solution: Check KOHAKU_HUB_ADMIN_SECRET_TOKEN in docker-compose.yml matches your input
Problem: "Admin API is disabled"
Solution: Set KOHAKU_HUB_ADMIN_ENABLED=true in environment
Statistics Not Updating
Problem: Stale data Solution: Click "Refresh Stats" button on dashboard
Storage Size Incorrect
Problem: Database out of sync with S3 Solution: Use "Recalculate" button in Quota Management or bulk recalculation endpoint
Can't Delete User
Problem: User owns repositories
Solution: Either delete repos first, or use "Force Delete" option with force=true parameter
Advanced Features
Time-Series Statistics
API:
curl -H "X-Admin-Token: your-token" \
"http://localhost:48888/admin/api/stats/timeseries?days=30"
Returns:
{
"repositories_by_day": {
"2025-01-01": {"model": 5, "dataset": 2, "space": 0},
"2025-01-02": {"model": 3, "dataset": 1, "space": 1}
},
"commits_by_day": {
"2025-01-01": 15,
"2025-01-02": 20
},
"users_by_day": {
"2025-01-01": 2,
"2025-01-02": 1
}
}
Use case: Build custom dashboards with charts
Top Repositories
By Commits:
curl -H "X-Admin-Token: your-token" \
"http://localhost:48888/admin/api/stats/top-repos?by=commits&limit=10"
By Size:
curl -H "X-Admin-Token: your-token" \
"http://localhost:48888/admin/api/stats/top-repos?by=size&limit=10"
Response:
{
"top_repositories": [
{
"repo_full_id": "org/active-model",
"repo_type": "model",
"commit_count": 150,
"private": false
}
],
"sorted_by": "commits"
}
Integration with CI/CD
Automated User Creation
import requests
admin_token = "your-admin-token"
base_url = "http://hub.example.com"
# Create user via API
response = requests.post(
f"{base_url}/admin/api/users",
headers={"X-Admin-Token": admin_token},
json={
"username": "ci-bot",
"email": "ci@company.com",
"password": "generated-password",
"email_verified": True,
"private_quota_bytes": 107374182400, # 100 GB
"public_quota_bytes": None, # Unlimited
}
)
user = response.json()
print(f"Created user: {user['username']} (ID: {user['id']})")
Bulk Invitation Generation
import requests
admin_token = "your-admin-token"
base_url = "http://hub.example.com"
# Create reusable invitation for 100 users
response = requests.post(
f"{base_url}/admin/api/invitations/register",
headers={"X-Admin-Token": admin_token},
json={
"org_id": 5, # Auto-join org after registration
"role": "member",
"max_usage": 100,
"expires_days": 90
}
)
invitation = response.json()
print(f"Invitation link: {invitation['invitation_link']}")
print(f"Can be used {invitation['max_usage']} times")
Monitoring Script
import requests
admin_token = "your-admin-token"
# Get statistics
response = requests.get(
"http://hub.example.com/admin/api/stats/detailed",
headers={"X-Admin-Token": admin_token}
)
stats = response.json()
# Alert if storage > 80%
total_used = stats['storage']['total_used']
if total_used > 0.8 * (100 * 1000 * 1000 * 1000): # 80GB
print("WARNING: Storage usage high!")
# Alert if too many inactive users
if stats['users']['inactive'] > 10:
print(f"WARNING: {stats['users']['inactive']} inactive users")
Performance Considerations
Database Queries
Admin operations run synchronous queries with db.atomic():
- User listings:
O(n)where n = total users - Repository stats: Aggregation queries with indexes
- Commit history: Indexed by repository_id and username
- Storage calculations: Aggregation over File table
Optimization:
- Limit page size (default: 100, max: 1000)
- Use filters to reduce result sets
- Statistics are computed on-demand (cache in frontend if needed)
S3 Bucket Scanning
Warning: Scanning large buckets is slow!
# For bucket with 100,000 objects:
# - Scan time: 30-60 seconds
# - Uses pagination (1000 objects per request)
Recommendation:
- Limit to specific prefixes when possible
- Don't scan too frequently
- Consider caching results for large buckets
Bulk Storage Recalculation
Performance:
- Processes repositories sequentially (safe for database)
- Progress logged every 10 repositories
- Can take 1-5 minutes for 1000 repositories
- Errors don't stop the process (logged and returned)
Use case:
- Run during maintenance windows
- Use filters to process subsets
- Monitor logs for progress
Comparison: Admin Portal vs CLI
| Feature | Admin Portal | kohub-cli | Best For |
|---|---|---|---|
| User management | ✅ GUI | ❌ No | Portal: Quick actions |
| Repository browser | ✅ Full | ⚠️ Limited | Portal: Overview CLI: Specific repos |
| Commit history | ✅ Full | ❌ No | Portal only |
| Storage browser | ✅ Full | ❌ No | Portal only |
| Quota management | ✅ Full | ⚠️ API only | Portal: Visual CLI: Scripting |
| Invitation management | ✅ Full | ❌ No | Portal only |
| Statistics | ✅ Dashboard | ❌ No | Portal only |
| Bulk operations | ✅ Full | ❌ No | Portal only |
| Automation | ❌ Manual | ✅ Scripts | Portal: Manual CLI: Automation |
Recommendation: Use portal for exploration/monitoring, API for automation.
Frequently Asked Questions
Q: Can I disable the admin portal?
A: Yes, set KOHAKU_HUB_ADMIN_ENABLED=false
Q: Is the admin token different from user tokens? A: Yes, admin token is system-wide. User tokens are per-user.
Q: Can I create multiple admin users? A: No, admin portal uses shared secret token. For user-based admin, implement role system.
Q: Does deleting a user delete their repositories? A: No (unless force delete). Repositories can be transferred to another user.
Q: Can I access admin API without the portal UI?
A: Yes, use curl/Python with X-Admin-Token header.
Q: Is audit logging enabled by default?
A: Yes, all admin operations are logged with [ADMIN] prefix.
Q: How do I create reusable invitations?
A: Set max_usage to a number (e.g., 50 for 50 uses) or -1 for unlimited.
Q: Can invitations auto-add users to organizations?
A: Yes, set org_id and role in the invitation. Users will automatically join after registration.
Last Updated: January 2025 Version: 1.1 Status: ✅ Production Ready
External Source Fallback System
Browse repositories from HuggingFace or other KohakuHub instances when not found locally.
Overview
The fallback system allows KohakuHub to seamlessly access repositories, files, and user profiles from external sources (like HuggingFace.co) when they're not available locally. This enables:
- Browsing HuggingFace repositories without manually importing them
- Downloading files from external sources
- Viewing user/org profiles from other hubs
- Connecting multiple KohakuHub instances for federated browsing
Quick Start
1. Configure Fallback Source (Admin Portal)
Navigate to: Admin Portal → Fallback Sources
Add HuggingFace:
Name: HuggingFace
URL: https://huggingface.co
Source Type: huggingface
Priority: 1
Token: (optional - for private repos)
Namespace: (empty for global)
Enabled: ✓
2. Browse External Repositories
Visit any HuggingFace user/org:
http://localhost:28080/openai
http://localhost:28080/stabilityai
View external models/datasets:
http://localhost:28080/models/openai/whisper-tiny
http://localhost:28080/datasets/karpathy/fineweb-edu
Download files:
from huggingface_hub import hf_hub_download
# Falls back to HuggingFace automatically
hf_hub_download(
repo_id="openai/whisper-tiny",
filename="model.bin"
)
How It Works
Architecture
User Request → KohakuHub
↓
Check Local Database
↓
Not Found (404)
↓
Try Fallback Sources (by priority)
1. HuggingFace
2. Other KohakuHub Instance
...
↓
Found! → Return with _source tag
Caching
Repository→Source mapping is cached (not content):
- Cache TTL: 5 minutes (configurable)
- Cache Key:
{repo_type}:{namespace}/{name} - Cache Value: Source URL, name, type
This reduces external API calls by 80%+.
Configuration
Global Sources (Environment Variable)
In docker-compose.yml:
environment:
KOHAKU_HUB_FALLBACK_ENABLED: "true"
KOHAKU_HUB_FALLBACK_CACHE_TTL: "300" # 5 minutes
KOHAKU_HUB_FALLBACK_TIMEOUT: "10" # 10 seconds
KOHAKU_HUB_FALLBACK_SOURCES: |
[
{
"url": "https://huggingface.co",
"token": "",
"priority": 1,
"name": "HuggingFace",
"source_type": "huggingface"
}
]
Database Sources (via Admin API)
Add via API:
curl -X POST http://localhost:48888/admin/api/fallback-sources \
-H "X-Admin-Token: your-admin-token" \
-H "Content-Type: application/json" \
-d '{
"namespace": "",
"url": "https://huggingface.co",
"name": "HuggingFace",
"source_type": "huggingface",
"priority": 1,
"enabled": true
}'
Or via Admin Portal:
- Navigate to: http://localhost:28080/admin/fallback-sources
- Click "Add Source"
- Fill in details
Supported Operations
Repository Operations
✅ Resolve/Download Files
GET /{type}s/{namespace}/{name}/resolve/{revision}/{path}- Returns 302 redirect to external URL
✅ List Files (Tree)
GET /api/{type}s/{namespace}/{name}/tree/{revision}- Returns file tree from external source
✅ Repository Info
GET /api/{type}s/{namespace}/{name}- Returns metadata from external source
✅ Revision Info
GET /api/{type}s/{namespace}/{name}/revision/{revision}- Returns commit/branch info
User/Organization Operations
✅ User Profile
GET /api/users/{username}/profile- Falls back to HF
/api/users/{username}/overview
✅ User Repositories
GET /api/users/{username}/repos- Aggregates from
/api/models,/api/datasets,/api/spaces
✅ Organization Profile
- Detected via
/api/organizations/{name}/members - Shows as organization page
List Aggregation
✅ Repository Lists
GET /api/models?author={name}- Merges local + external results with
_sourcetags
❌ Disabled by Default On:
- Homepage trending lists (
?fallback=false) - Main browse pages (
/models,/datasets,/spaces)
URL Mapping (HuggingFace)
HuggingFace has asymmetric URL patterns:
| Operation | KohakuHub | HuggingFace |
|---|---|---|
| Models Download | /models/{ns}/{name}/resolve/... |
/{ns}/{name}/resolve/... |
| Datasets Download | /datasets/{ns}/{name}/resolve/... |
/datasets/{ns}/{name}/resolve/... |
| API Endpoints | /api/{type}s/... |
/api/{type}s/... |
The fallback client automatically handles these transformations.
External Source Indicators
Repository Pages
External repos show:
- Badge in header:
[☁️ External: https://huggingface.co] - Disabled commits tab (not available for external repos)
- All metadata tagged with
_sourcefield
User/Org Pages
External profiles show:
- Badge in profile card:
[☁️ HuggingFace] - "Limited profile" indicator (bio/website may be missing)
- All repos tagged with source
Query Parameters
Disable fallback per-request:
GET /api/models?fallback=false
Useful for:
- Homepage (show local only)
- Admin interfaces
- Performance-critical lists
Admin Interface
Fallback Sources Management:
Access: http://localhost:28080/admin/fallback-sources
Features:
- Add/Edit/Delete sources
- Enable/Disable sources
- Set priority order
- View cache statistics
- Clear cache manually
Cache Stats:
- Current size
- Max size (10,000 entries)
- TTL (300 seconds default)
- Usage percentage
Limitations
What Works
✅ Browsing external repos (tree, files, metadata) ✅ Downloading files (302 redirect to external) ✅ Viewing user/org profiles ✅ Listing user's repositories ✅ YAML frontmatter metadata
What Doesn't Work
❌ Commits - Not available for external repos ❌ Editing - Can't modify external repos ❌ Git Clone - Only local repos support Git clone ❌ LFS Upload - Can't upload to external sources ❌ Private Access - Requires admin-configured tokens (no user token passthrough)
Security
User Privacy:
- ❌ Local user credentials are NEVER sent to external sources
- ✅ Only admin-configured tokens are used
- ✅ Public repos work without any tokens
Admin Token:
- Configure once in admin portal
- Used for all external requests
- Can access private repos on external source (if token has permission)
Troubleshooting
External repos not showing:
- Check fallback sources in admin portal
- Verify source is enabled
- Check cache TTL (may need to wait or clear cache)
- Look for errors in backend logs
404 errors for external content:
- Verify the repo exists on the external source
- Check if source URL is correct
- Try clearing cache in admin portal
Performance issues:
- Check cache stats (should be >80% hit rate)
- Reduce number of external sources
- Increase cache TTL
- Use
?fallback=falsefor performance-critical pages
Advanced Configuration
Multiple Sources
Priority ordering:
[
{"url": "https://your-hub.com", "priority": 1, "name": "Internal"},
{"url": "https://huggingface.co", "priority": 2, "name": "HuggingFace"}
]
Lower priority = checked first.
Per-Namespace Sources
User/org-specific fallback:
{
"namespace": "my-team",
"url": "https://team-hub.com",
"priority": 1
}
Only applies when browsing my-team/* repos.
Cache Tuning
KOHAKU_HUB_FALLBACK_CACHE_TTL=600 # 10 minutes
KOHAKU_HUB_FALLBACK_TIMEOUT=20 # 20 second timeout
KOHAKU_HUB_FALLBACK_MAX_CONCURRENT=10 # 10 concurrent requests
API Reference
Admin Endpoints:
POST /admin/api/fallback-sources # Create source
GET /admin/api/fallback-sources # List sources
GET /admin/api/fallback-sources/{id} # Get source
PUT /admin/api/fallback-sources/{id} # Update source
DELETE /admin/api/fallback-sources/{id} # Delete source
GET /admin/api/fallback-sources/cache/stats # Cache stats
DELETE /admin/api/fallback-sources/cache/clear # Clear cache
Query Parameters:
?fallback=false # Disable fallback for this request
?fallback=true # Enable fallback (default)
Examples
Example 1: Browse HuggingFace Models
# View Stability AI's models
curl http://localhost:28080/api/models?author=stabilityai
# Returns local + HuggingFace models tagged with _source
Example 2: Download from HuggingFace
from huggingface_hub import hf_hub_download
# Falls back to HuggingFace automatically
model_path = hf_hub_download(
repo_id="openai/whisper-tiny",
filename="config.json"
)
Example 3: Federated KohakuHub
Connect company internal hub:
{
"url": "https://internal-hub.company.com",
"source_type": "kohakuhub",
"priority": 1,
"token": "internal_token_here"
}
Now you can browse internal repos + HuggingFace from one interface!
Performance
Typical Response Times:
- Cache Hit: <100ms (instant)
- Cache Miss (HF): <2s (external API call)
- File Download: 302 redirect (no proxy, full speed)
Cache Hit Rate:
- Expected: >80% after warmup
- Check: Admin Portal → Fallback Sources → Cache Stats