# Admin Portal Guide *Complete guide to KohakuHub's administration interface* **Last Updated:** January 2025 **Access:** http://your-hub.com/admin --- ## Table of Contents 1. [Overview](#overview) 2. [Authentication](#authentication) 3. [Dashboard](#dashboard) 4. [User Management](#user-management) 5. [Repository Management](#repository-management) 6. [Commit History Viewer](#commit-history-viewer) 7. [S3 Storage Browser](#s3-storage-browser) 8. [Quota Management](#quota-management) 9. [Invitation Management](#invitation-management) 10. [API Reference](#api-reference) 11. [Security Best Practices](#security-best-practices) --- ## Overview The Admin Portal provides a centralized interface for managing your KohakuHub instance. It offers: - **User Management** - Create, view, and delete users - **Repository Browser** - View all repositories with statistics - **Commit History** - Track commits across all repositories - **Storage Browser** - Browse S3 buckets and objects - **Quota Management** - Set and monitor storage quotas - **Invitation Management** - Create and manage registration invitations - **Statistics Dashboard** - Real-time insights into usage - **Bulk Operations** - Recalculate storage for all repositories **Access URL:** ``` http://your-hub.com/admin ``` --- ## Authentication ### Admin Token The admin portal requires a secret token configured in your environment: **Configuration:** ```yaml # docker-compose.yml environment: KOHAKU_HUB_ADMIN_ENABLED: "true" KOHAKU_HUB_ADMIN_SECRET_TOKEN: "your-secret-token-here" # CHANGE THIS! ``` **Security:** - ⚠️ **NEVER** use default token `"change-me-in-production"` in production - ✅ Generate strong random token: `openssl rand -hex 32` - ✅ Store securely (environment variable, secrets manager) - ✅ Rotate regularly - ✅ Use HTTPS in production ### Login 1. Navigate to `/admin` 2. Enter your admin secret token 3. Token is stored in browser session (not localStorage for security) 4. Auto-logout on browser close **Example:** ```bash # Generate secure token openssl rand -hex 32 # Output: a1b2c3d4e5f6... # Add to docker-compose.yml KOHAKU_HUB_ADMIN_SECRET_TOKEN: "a1b2c3d4e5f6..." # Restart docker-compose up -d ``` --- ## Dashboard ### Overview Statistics The dashboard shows real-time statistics from your database: **User Stats:** - Total users - Active users - Email verified users - Inactive users **Organization Stats:** - Total organizations **Repository Stats:** - Total repositories - Private vs public repositories - Breakdown by type (models, datasets, spaces) **Commit Stats:** - Total commits - Top contributors (by commit count) **Storage Stats:** - Total storage used (private + public) - Private vs public storage - LFS object count and size **Quick Actions:** - Navigate to user management - Browse repositories - View commits - Inspect S3 storage - Manage quotas - Manage invitations --- ## User Management ### List Users **Features:** - View all users with pagination - Sort by ID, username, storage usage - Filter and search - Storage quota visualization **Columns:** - ID, Username, Email - Private storage (used/quota) - Public storage (used/quota) - Total storage - Email verification status - Active status - Created date ### Create User **Fields:** - Username (required, unique) - Email (required, unique) - Password (required) - Email verified (checkbox) - Is active (checkbox) - Private quota (bytes, optional = unlimited) - Public quota (bytes, optional = unlimited) **Example:** ``` Username: alice Email: alice@example.com Password: ******** Email Verified: ✓ Is Active: ✓ Private Quota: 10737418240 (10 GB) Public Quota: 53687091200 (50 GB) ``` **API Endpoint:** ```bash curl -X POST http://localhost:48888/admin/api/users \ -H "X-Admin-Token: your-secret-token" \ -H "Content-Type: application/json" \ -d '{ "username": "alice", "email": "alice@example.com", "password": "secure_password", "email_verified": true, "is_active": true, "private_quota_bytes": 10737418240, "public_quota_bytes": 53687091200 }' ``` ### View User Details Click "View" to see: - User ID, username, email - Verification and active status - Storage quotas (private, public) - Storage used (private, public) - Created date **Actions:** - Manage Quota (navigate to quota page) ### Delete User **Normal Delete:** - Deletes user account - Deletes all sessions and tokens - Deletes organization memberships - **Keeps** repositories (must delete separately) **Force Delete:** - Deletes everything above - **Also deletes** all owned repositories - ⚠️ Cannot be undone! **Workflow:** 1. Click "Delete" → Confirmation dialog 2. If user owns repos → Shows repo list 3. Choose: Cancel or Force Delete 4. Confirm force delete → All data deleted **API Endpoint:** ```bash # Normal delete (fails if user owns repos) curl -X DELETE http://localhost:48888/admin/api/users/alice \ -H "X-Admin-Token: your-secret-token" # Force delete (deletes user and all their repos) curl -X DELETE "http://localhost:48888/admin/api/users/alice?force=true" \ -H "X-Admin-Token: your-secret-token" ``` ### Toggle Email Verification **Use case:** Manually verify users when email verification is disabled or failed. **Action:** Click "Verify" or "Unverify" button → Instant update **API Endpoint:** ```bash curl -X PATCH http://localhost:48888/admin/api/users/alice/email-verification?verified=true \ -H "X-Admin-Token: your-secret-token" ``` --- ## Repository Management ### List Repositories **Filters:** - Repository type (model/dataset/space) - Namespace (user or organization) **Columns:** - ID - Type (color-coded badge) - Full repository ID (namespace/name) - Privacy status (Private/Public badge) - Owner username - Storage quota and usage - Created date **Actions:** - View Details → Opens detailed dialog ### Repository Details **Information:** - ID, Type, Full ID - Namespace, Name - Owner username - Privacy status - Created date - **File count** (from database, active files only) - **Commit count** (from database) - **Total size** (sum of all active files) - **Quota information** (quota, used, percentage, inheriting status) **Actions:** - View in Main App → Opens repository in main UI **API Endpoint:** ```bash curl http://localhost:48888/admin/api/repositories/model/org/my-model \ -H "X-Admin-Token: your-secret-token" ``` --- ## Commit History Viewer ### Overview View all commits across all repositories in your instance. **Filters:** - Repository ID (e.g., "org/repo-name") - Author username **Columns:** - Commit ID (first 8 chars) - Repository (type badge + full ID) - Branch - Author - Message (truncated, hover for full) - Created date **Sorting:** - Sort by ID, created date, username, repository **Pagination:** - Page size: 10, 20, 50, 100 - Navigate through pages **API Endpoint:** ```bash # List all commits curl http://localhost:48888/admin/api/commits?limit=100 \ -H "X-Admin-Token: your-secret-token" # Filter by repository curl "http://localhost:48888/admin/api/commits?repo_full_id=org/model&limit=50" \ -H "X-Admin-Token: your-secret-token" # Filter by author curl "http://localhost:48888/admin/api/commits?username=alice&limit=50" \ -H "X-Admin-Token: your-secret-token" ``` ### Use Cases - Track user activity - Find specific commits - Monitor repository changes - Debug commit issues - Audit trail --- ## S3 Storage Browser ### Bucket List **Overview:** - View all S3 buckets - Total size and object count - Visual progress bars - Creation dates **Metrics:** - Bucket name - Total size (formatted: KB, MB, GB, TB) - Object count - Creation date - Progress bar (relative to 100GB) **Actions:** - Click bucket → Browse contents **API Endpoint:** ```bash curl http://localhost:48888/admin/api/storage/buckets \ -H "X-Admin-Token: your-secret-token" ``` **Response:** ```json { "buckets": [ { "name": "hub-storage", "creation_date": "2025-01-01T00:00:00Z", "total_size": 107374182400, "object_count": 5000 } ] } ``` ### Object Browser **Features:** - List objects in selected bucket - Filter by prefix (e.g., "lfs/", "models/") - Pagination (up to 1000 objects) **Columns:** - Key (full S3 path) - Size - Storage class (STANDARD, etc.) - Last modified date **Prefix Filtering:** ``` Enter prefix: lfs/ → Shows only objects starting with "lfs/" Enter prefix: hf-model-org-repo/ → Shows objects for specific repository ``` **API Endpoint:** ```bash # List objects in bucket curl "http://localhost:48888/admin/api/storage/objects/hub-storage?prefix=lfs/&limit=100" \ -H "X-Admin-Token: your-secret-token" ``` --- ## Quota Management ### View Quota **Per-user or per-organization:** - Private quota (limit) - Private used - Public quota (limit) - Public used - Total usage - Usage percentages **API Endpoint:** ```bash # Get user quota curl "http://localhost:48888/admin/api/quota/alice?is_org=false" \ -H "X-Admin-Token: your-secret-token" # Get organization quota curl "http://localhost:48888/admin/api/quota/my-org?is_org=true" \ -H "X-Admin-Token: your-secret-token" ``` **Response:** ```json { "namespace": "alice", "is_organization": false, "private_quota_bytes": 10737418240, "public_quota_bytes": 53687091200, "private_used_bytes": 1234567890, "public_used_bytes": 5678901234, "private_available_bytes": 9502850350, "public_available_bytes": 47008189966, "private_percentage_used": 11.5, "public_percentage_used": 10.6, "total_used_bytes": 6913469124 } ``` ### Set Quota **Fields:** - Private quota bytes (null = unlimited) - Public quota bytes (null = unlimited) **Examples:** ``` 10 GB = 10737418240 bytes 50 GB = 53687091200 bytes Unlimited = (empty/null) ``` **API Endpoint:** ```bash curl -X PUT http://localhost:48888/admin/api/quota/alice \ -H "X-Admin-Token: your-secret-token" \ -H "Content-Type: application/json" \ -d '{ "private_quota_bytes": 10737418240, "public_quota_bytes": 53687091200 }' ``` ### Recalculate Storage **Purpose:** Re-scan all files and update storage usage. **When to use:** - Database out of sync - After manual S3 operations - Quota shows incorrect values **Process:** 1. Scans all files for namespace 2. Sums file sizes (private and public separately) 3. Updates User/Organization table **API Endpoint:** ```bash curl -X POST "http://localhost:48888/admin/api/quota/alice/recalculate?is_org=false" \ -H "X-Admin-Token: your-secret-token" ``` ### Bulk Storage Recalculation **NEW:** Recalculate storage for all repositories at once. **API Endpoint:** ```bash # Recalculate all repositories curl -X POST http://localhost:48888/admin/api/repositories/recalculate-all \ -H "X-Admin-Token: your-secret-token" # Filter by type curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?repo_type=model" \ -H "X-Admin-Token: your-secret-token" # Filter by namespace curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?namespace=org" \ -H "X-Admin-Token: your-secret-token" ``` **Response:** ```json { "total": 250, "success_count": 248, "failure_count": 2, "failures": [ { "repo_id": "org/problem-repo", "error": "Repository not found in LakeFS" } ], "message": "Recalculated storage for 248/250 repositories" } ``` --- ## Invitation Management ### Create Registration Invitation **Purpose:** Generate invitations for user registration (useful for invite-only mode). **Features:** - Optional organization membership after registration - Reusable invitations with usage limits - Configurable expiration **API Endpoint:** ```bash curl -X POST http://localhost:48888/admin/api/invitations/register \ -H "X-Admin-Token: your-secret-token" \ -H "Content-Type: application/json" \ -d '{ "org_id": null, "role": "member", "max_usage": 10, "expires_days": 30 }' ``` **Response:** ```json { "success": true, "token": "abc123xyz...", "invitation_link": "http://your-hub.com/register?invitation=abc123xyz...", "expires_at": "2025-02-14T12:00:00Z", "max_usage": 10, "is_reusable": true, "action": "register_account" } ``` **Invitation Types:** - **One-time:** `max_usage: null` - Single use invitation - **Limited:** `max_usage: 10` - Can be used 10 times - **Unlimited:** `max_usage: -1` - Unlimited uses **Auto-join Organization:** ```json { "org_id": 5, "role": "member", "max_usage": 50, "expires_days": 90 } ``` Users who register with this invitation will automatically join the organization as members. ### List All Invitations **API Endpoint:** ```bash # List all invitations curl http://localhost:48888/admin/api/invitations \ -H "X-Admin-Token: your-secret-token" # Filter by action type curl "http://localhost:48888/admin/api/invitations?action=register_account" \ -H "X-Admin-Token: your-secret-token" ``` **Response:** ```json { "invitations": [ { "id": 1, "token": "abc123...", "action": "register_account", "org_id": null, "org_name": null, "role": null, "email": null, "created_by": 1, "creator_username": "System", "created_at": "2025-01-15T12:00:00Z", "expires_at": "2025-02-15T12:00:00Z", "max_usage": 10, "usage_count": 5, "is_reusable": true, "is_available": true, "error_message": null, "used_at": null, "used_by": null } ], "limit": 100, "offset": 0 } ``` ### Delete Invitation **API Endpoint:** ```bash curl -X DELETE http://localhost:48888/admin/api/invitations/{token} \ -H "X-Admin-Token: your-secret-token" ``` --- ## API Reference ### Authentication All admin API endpoints require `X-Admin-Token` header: ```bash curl -H "X-Admin-Token: your-secret-token" \ http://localhost:48888/admin/api/stats ``` ### Endpoints Overview **User Management:** ``` GET /admin/api/users # List users GET /admin/api/users/{username} # Get user info POST /admin/api/users # Create user DELETE /admin/api/users/{username} # Delete user PATCH /admin/api/users/{username}/email-verification # Set verification ``` **Repository Management:** ``` GET /admin/api/repositories # List repositories GET /admin/api/repositories/{type}/{namespace}/{name} # Get details POST /admin/api/repositories/recalculate-all # Bulk storage recalc ``` **Commit History:** ``` GET /admin/api/commits # List commits ``` **Storage:** ``` GET /admin/api/storage/buckets # List buckets GET /admin/api/storage/objects/{bucket} # List objects ``` **Statistics:** ``` GET /admin/api/stats # Basic stats GET /admin/api/stats/detailed # Detailed stats GET /admin/api/stats/timeseries?days=30 # Time-series data GET /admin/api/stats/top-repos?by=commits # Top repositories ``` **Quota:** ``` GET /admin/api/quota/{namespace} # Get quota PUT /admin/api/quota/{namespace} # Set quota POST /admin/api/quota/{namespace}/recalculate # Recalculate ``` **Invitations:** ``` POST /admin/api/invitations/register # Create registration invitation GET /admin/api/invitations # List all invitations DELETE /admin/api/invitations/{token} # Delete invitation ``` ### Response Formats **User Info:** ```json { "id": 1, "username": "alice", "email": "alice@example.com", "email_verified": true, "is_active": true, "private_quota_bytes": 10737418240, "public_quota_bytes": 53687091200, "private_used_bytes": 1234567, "public_used_bytes": 9876543, "created_at": "2025-01-01T00:00:00.000000Z" } ``` **Repository Info:** ```json { "id": 42, "repo_type": "model", "namespace": "org", "name": "my-model", "full_id": "org/my-model", "private": false, "owner_id": 1, "owner_username": "alice", "created_at": "2025-01-01T00:00:00.000000Z", "file_count": 15, "commit_count": 8, "total_size": 12345678, "quota_bytes": null, "used_bytes": 12345678, "percentage_used": 0.12, "is_inheriting": true } ``` **Detailed Stats:** ```json { "users": { "total": 100, "active": 95, "verified": 80, "inactive": 5 }, "organizations": { "total": 10 }, "repositories": { "total": 250, "private": 100, "public": 150, "by_type": { "model": 180, "dataset": 60, "space": 10 } }, "commits": { "total": 1500, "top_contributors": [ {"username": "alice", "commit_count": 150}, {"username": "bob", "commit_count": 120} ] }, "lfs": { "total_objects": 500, "total_size": 107374182400 }, "storage": { "private_used": 10737418240, "public_used": 53687091200, "total_used": 64424509440 } } ``` --- ## Security Best Practices ### Token Management **DO:** - ✅ Generate cryptographically random tokens - ✅ Use environment variables (never hardcode) - ✅ Rotate tokens regularly (monthly) - ✅ Use HTTPS in production - ✅ Restrict admin portal access (firewall, VPN) **DON'T:** - ❌ Use default token in production - ❌ Commit tokens to git - ❌ Share tokens via insecure channels - ❌ Use same token across environments - ❌ Store tokens in browser localStorage ### Token Rotation ```bash # 1. Generate new token NEW_TOKEN=$(openssl rand -hex 32) # 2. Update docker-compose.yml KOHAKU_HUB_ADMIN_SECRET_TOKEN: "$NEW_TOKEN" # 3. Restart services docker-compose up -d # 4. Update saved tokens in admin portal sessions ``` ### Network Security **Production Deployment:** ```nginx # Restrict admin portal to specific IPs location /admin { allow 192.168.1.0/24; # Internal network allow 10.0.0.0/8; # VPN deny all; # ... rest of config } ``` **Alternative: Basic Auth Layer** ```nginx location /admin/api/ { auth_basic "Admin Area"; auth_basic_user_file /etc/nginx/.htpasswd; # Then require X-Admin-Token header proxy_pass http://hub-api:48888; } ``` ### Audit Logging Admin operations are logged with `[ADMIN]` prefix: ``` [WARNING] [ADMIN] [07:05:55] Admin deleted user: testuser (deleted 5 repositories) [INFO] [ADMIN] [07:06:12] Admin set quota for user alice: private=10737418240, public=53687091200 [WARNING] [ADMIN] [07:06:45] Admin created registration invitation (max_usage=10, expires=30d) ``` **Monitor logs:** ```bash docker logs khub-hub-api | grep "\[ADMIN\]" ``` --- ## Use Cases ### Scenario 1: New User Onboarding ``` 1. Dashboard → Quick Actions → "Manage Users" 2. Click "Create User" 3. Fill form: - Username: newuser - Email: newuser@company.com - Password: (generate secure password) - Email Verified: ✓ - Quotas: 10GB private, 50GB public 4. Click "Create User" 5. Share credentials with user ``` ### Scenario 2: Invite-Only Registration Mode ``` 1. Dashboard → "Manage Invitations" 2. Click "Create Registration Invitation" 3. Configure: - Max Usage: 50 (for team) - Expires: 90 days - Auto-join Organization: my-company (as member) 4. Copy invitation link 5. Share link with team members 6. Monitor usage count ``` ### Scenario 3: Storage Cleanup ``` 1. Dashboard → "Browse Storage" 2. Click on "hub-storage" bucket 3. Filter by prefix: "lfs/" 4. Review large objects 5. Identify unused LFS objects 6. (Manually delete via CLI/API if needed) ``` ### Scenario 4: User Investigation ``` 1. Dashboard → "View Commits" 2. Filter by username: "suspicious-user" 3. Review commit activity 4. Click repository links to inspect content 5. If needed: Go to Users → Delete user (with force) ``` ### Scenario 5: Quota Enforcement ``` 1. Dashboard → "Manage Quotas" 2. Select namespace (user or org) 3. View current usage 4. Set new limits if exceeded 5. Click "Recalculate" to verify 6. Monitor dashboard for compliance ``` ### Scenario 6: System Maintenance ``` 1. Dashboard → "Bulk Operations" 2. Click "Recalculate All Repository Storage" 3. Optional: Filter by type or namespace 4. Confirm operation 5. Wait for completion (progress logged) 6. Review success/failure report ``` --- ## Troubleshooting ### Can't Login **Problem:** Invalid admin token **Solution:** Check `KOHAKU_HUB_ADMIN_SECRET_TOKEN` in docker-compose.yml matches your input --- **Problem:** "Admin API is disabled" **Solution:** Set `KOHAKU_HUB_ADMIN_ENABLED=true` in environment --- ### Statistics Not Updating **Problem:** Stale data **Solution:** Click "Refresh Stats" button on dashboard --- ### Storage Size Incorrect **Problem:** Database out of sync with S3 **Solution:** Use "Recalculate" button in Quota Management or bulk recalculation endpoint --- ### Can't Delete User **Problem:** User owns repositories **Solution:** Either delete repos first, or use "Force Delete" option with `force=true` parameter --- ## Advanced Features ### Time-Series Statistics **API:** ```bash curl -H "X-Admin-Token: your-token" \ "http://localhost:48888/admin/api/stats/timeseries?days=30" ``` **Returns:** ```json { "repositories_by_day": { "2025-01-01": {"model": 5, "dataset": 2, "space": 0}, "2025-01-02": {"model": 3, "dataset": 1, "space": 1} }, "commits_by_day": { "2025-01-01": 15, "2025-01-02": 20 }, "users_by_day": { "2025-01-01": 2, "2025-01-02": 1 } } ``` **Use case:** Build custom dashboards with charts ### Top Repositories **By Commits:** ```bash curl -H "X-Admin-Token: your-token" \ "http://localhost:48888/admin/api/stats/top-repos?by=commits&limit=10" ``` **By Size:** ```bash curl -H "X-Admin-Token: your-token" \ "http://localhost:48888/admin/api/stats/top-repos?by=size&limit=10" ``` **Response:** ```json { "top_repositories": [ { "repo_full_id": "org/active-model", "repo_type": "model", "commit_count": 150, "private": false } ], "sorted_by": "commits" } ``` --- ## Integration with CI/CD ### Automated User Creation ```python import requests admin_token = "your-admin-token" base_url = "http://hub.example.com" # Create user via API response = requests.post( f"{base_url}/admin/api/users", headers={"X-Admin-Token": admin_token}, json={ "username": "ci-bot", "email": "ci@company.com", "password": "generated-password", "email_verified": True, "private_quota_bytes": 107374182400, # 100 GB "public_quota_bytes": None, # Unlimited } ) user = response.json() print(f"Created user: {user['username']} (ID: {user['id']})") ``` ### Bulk Invitation Generation ```python import requests admin_token = "your-admin-token" base_url = "http://hub.example.com" # Create reusable invitation for 100 users response = requests.post( f"{base_url}/admin/api/invitations/register", headers={"X-Admin-Token": admin_token}, json={ "org_id": 5, # Auto-join org after registration "role": "member", "max_usage": 100, "expires_days": 90 } ) invitation = response.json() print(f"Invitation link: {invitation['invitation_link']}") print(f"Can be used {invitation['max_usage']} times") ``` ### Monitoring Script ```python import requests admin_token = "your-admin-token" # Get statistics response = requests.get( "http://hub.example.com/admin/api/stats/detailed", headers={"X-Admin-Token": admin_token} ) stats = response.json() # Alert if storage > 80% total_used = stats['storage']['total_used'] if total_used > 0.8 * (100 * 1000 * 1000 * 1000): # 80GB print("WARNING: Storage usage high!") # Alert if too many inactive users if stats['users']['inactive'] > 10: print(f"WARNING: {stats['users']['inactive']} inactive users") ``` --- ## Performance Considerations ### Database Queries Admin operations run synchronous queries with `db.atomic()`: - User listings: `O(n)` where n = total users - Repository stats: Aggregation queries with indexes - Commit history: Indexed by repository_id and username - Storage calculations: Aggregation over File table **Optimization:** - Limit page size (default: 100, max: 1000) - Use filters to reduce result sets - Statistics are computed on-demand (cache in frontend if needed) ### S3 Bucket Scanning **Warning:** Scanning large buckets is slow! ```python # For bucket with 100,000 objects: # - Scan time: 30-60 seconds # - Uses pagination (1000 objects per request) ``` **Recommendation:** - Limit to specific prefixes when possible - Don't scan too frequently - Consider caching results for large buckets ### Bulk Storage Recalculation **Performance:** - Processes repositories sequentially (safe for database) - Progress logged every 10 repositories - Can take 1-5 minutes for 1000 repositories - Errors don't stop the process (logged and returned) **Use case:** - Run during maintenance windows - Use filters to process subsets - Monitor logs for progress --- ## Comparison: Admin Portal vs CLI | Feature | Admin Portal | kohub-cli | Best For | |---------|--------------|-----------|----------| | User management | ✅ GUI | ❌ No | Portal: Quick actions | | Repository browser | ✅ Full | ⚠️ Limited | Portal: Overview
CLI: Specific repos | | Commit history | ✅ Full | ❌ No | Portal only | | Storage browser | ✅ Full | ❌ No | Portal only | | Quota management | ✅ Full | ⚠️ API only | Portal: Visual
CLI: Scripting | | Invitation management | ✅ Full | ❌ No | Portal only | | Statistics | ✅ Dashboard | ❌ No | Portal only | | Bulk operations | ✅ Full | ❌ No | Portal only | | Automation | ❌ Manual | ✅ Scripts | Portal: Manual
CLI: Automation | **Recommendation:** Use portal for exploration/monitoring, API for automation. --- ## Frequently Asked Questions **Q: Can I disable the admin portal?** A: Yes, set `KOHAKU_HUB_ADMIN_ENABLED=false` **Q: Is the admin token different from user tokens?** A: Yes, admin token is system-wide. User tokens are per-user. **Q: Can I create multiple admin users?** A: No, admin portal uses shared secret token. For user-based admin, implement role system. **Q: Does deleting a user delete their repositories?** A: No (unless force delete). Repositories can be transferred to another user. **Q: Can I access admin API without the portal UI?** A: Yes, use curl/Python with `X-Admin-Token` header. **Q: Is audit logging enabled by default?** A: Yes, all admin operations are logged with `[ADMIN]` prefix. **Q: How do I create reusable invitations?** A: Set `max_usage` to a number (e.g., 50 for 50 uses) or -1 for unlimited. **Q: Can invitations auto-add users to organizations?** A: Yes, set `org_id` and `role` in the invitation. Users will automatically join after registration. --- **Last Updated:** January 2025 **Version:** 1.1 **Status:** ✅ Production Ready # External Source Fallback System **Browse repositories from HuggingFace or other KohakuHub instances when not found locally.** --- ## Overview The fallback system allows KohakuHub to seamlessly access repositories, files, and user profiles from external sources (like HuggingFace.co) when they're not available locally. This enables: - **Browsing HuggingFace repositories** without manually importing them - **Downloading files** from external sources - **Viewing user/org profiles** from other hubs - **Connecting multiple KohakuHub instances** for federated browsing --- ## Quick Start ### 1. Configure Fallback Source (Admin Portal) Navigate to: **Admin Portal → Fallback Sources** **Add HuggingFace:** ``` Name: HuggingFace URL: https://huggingface.co Source Type: huggingface Priority: 1 Token: (optional - for private repos) Namespace: (empty for global) Enabled: ✓ ``` ### 2. Browse External Repositories **Visit any HuggingFace user/org:** ``` http://localhost:28080/openai http://localhost:28080/stabilityai ``` **View external models/datasets:** ``` http://localhost:28080/models/openai/whisper-tiny http://localhost:28080/datasets/karpathy/fineweb-edu ``` **Download files:** ```python from huggingface_hub import hf_hub_download # Falls back to HuggingFace automatically hf_hub_download( repo_id="openai/whisper-tiny", filename="model.bin" ) ``` --- ## How It Works ### Architecture ``` User Request → KohakuHub ↓ Check Local Database ↓ Not Found (404) ↓ Try Fallback Sources (by priority) 1. HuggingFace 2. Other KohakuHub Instance ... ↓ Found! → Return with _source tag ``` ### Caching **Repository→Source mapping is cached** (not content): - **Cache TTL**: 5 minutes (configurable) - **Cache Key**: `{repo_type}:{namespace}/{name}` - **Cache Value**: Source URL, name, type This reduces external API calls by 80%+. --- ## Configuration ### Global Sources (Environment Variable) **In `docker-compose.yml`:** ```yaml environment: KOHAKU_HUB_FALLBACK_ENABLED: "true" KOHAKU_HUB_FALLBACK_CACHE_TTL: "300" # 5 minutes KOHAKU_HUB_FALLBACK_TIMEOUT: "10" # 10 seconds KOHAKU_HUB_FALLBACK_SOURCES: | [ { "url": "https://huggingface.co", "token": "", "priority": 1, "name": "HuggingFace", "source_type": "huggingface" } ] ``` ### Database Sources (via Admin API) **Add via API:** ```bash curl -X POST http://localhost:48888/admin/api/fallback-sources \ -H "X-Admin-Token: your-admin-token" \ -H "Content-Type: application/json" \ -d '{ "namespace": "", "url": "https://huggingface.co", "name": "HuggingFace", "source_type": "huggingface", "priority": 1, "enabled": true }' ``` **Or via Admin Portal:** - Navigate to: http://localhost:28080/admin/fallback-sources - Click "Add Source" - Fill in details --- ## Supported Operations ### Repository Operations ✅ **Resolve/Download Files** - `GET /{type}s/{namespace}/{name}/resolve/{revision}/{path}` - Returns 302 redirect to external URL ✅ **List Files (Tree)** - `GET /api/{type}s/{namespace}/{name}/tree/{revision}` - Returns file tree from external source ✅ **Repository Info** - `GET /api/{type}s/{namespace}/{name}` - Returns metadata from external source ✅ **Revision Info** - `GET /api/{type}s/{namespace}/{name}/revision/{revision}` - Returns commit/branch info ### User/Organization Operations ✅ **User Profile** - `GET /api/users/{username}/profile` - Falls back to HF `/api/users/{username}/overview` ✅ **User Repositories** - `GET /api/users/{username}/repos` - Aggregates from `/api/models`, `/api/datasets`, `/api/spaces` ✅ **Organization Profile** - Detected via `/api/organizations/{name}/members` - Shows as organization page ### List Aggregation ✅ **Repository Lists** - `GET /api/models?author={name}` - Merges local + external results with `_source` tags ❌ **Disabled by Default On:** - Homepage trending lists (`?fallback=false`) - Main browse pages (`/models`, `/datasets`, `/spaces`) --- ## URL Mapping (HuggingFace) **HuggingFace has asymmetric URL patterns:** | Operation | KohakuHub | HuggingFace | |-----------|-----------|-------------| | **Models Download** | `/models/{ns}/{name}/resolve/...` | `/{ns}/{name}/resolve/...` | | **Datasets Download** | `/datasets/{ns}/{name}/resolve/...` | `/datasets/{ns}/{name}/resolve/...` | | **API Endpoints** | `/api/{type}s/...` | `/api/{type}s/...` | The fallback client automatically handles these transformations. --- ## External Source Indicators ### Repository Pages **External repos show:** - Badge in header: `[☁️ External: https://huggingface.co]` - Disabled commits tab (not available for external repos) - All metadata tagged with `_source` field ### User/Org Pages **External profiles show:** - Badge in profile card: `[☁️ HuggingFace]` - "Limited profile" indicator (bio/website may be missing) - All repos tagged with source --- ## Query Parameters **Disable fallback per-request:** ``` GET /api/models?fallback=false ``` Useful for: - Homepage (show local only) - Admin interfaces - Performance-critical lists --- ## Admin Interface **Fallback Sources Management:** **Access:** http://localhost:28080/admin/fallback-sources **Features:** - Add/Edit/Delete sources - Enable/Disable sources - Set priority order - View cache statistics - Clear cache manually **Cache Stats:** - Current size - Max size (10,000 entries) - TTL (300 seconds default) - Usage percentage --- ## Limitations ### What Works ✅ Browsing external repos (tree, files, metadata) ✅ Downloading files (302 redirect to external) ✅ Viewing user/org profiles ✅ Listing user's repositories ✅ YAML frontmatter metadata ### What Doesn't Work ❌ **Commits** - Not available for external repos ❌ **Editing** - Can't modify external repos ❌ **Git Clone** - Only local repos support Git clone ❌ **LFS Upload** - Can't upload to external sources ❌ **Private Access** - Requires admin-configured tokens (no user token passthrough) --- ## Security **User Privacy:** - ❌ Local user credentials are **NEVER** sent to external sources - ✅ Only admin-configured tokens are used - ✅ Public repos work without any tokens **Admin Token:** - Configure once in admin portal - Used for all external requests - Can access private repos on external source (if token has permission) --- ## Troubleshooting **External repos not showing:** 1. Check fallback sources in admin portal 2. Verify source is enabled 3. Check cache TTL (may need to wait or clear cache) 4. Look for errors in backend logs **404 errors for external content:** 1. Verify the repo exists on the external source 2. Check if source URL is correct 3. Try clearing cache in admin portal **Performance issues:** 1. Check cache stats (should be >80% hit rate) 2. Reduce number of external sources 3. Increase cache TTL 4. Use `?fallback=false` for performance-critical pages --- ## Advanced Configuration ### Multiple Sources **Priority ordering:** ```json [ {"url": "https://your-hub.com", "priority": 1, "name": "Internal"}, {"url": "https://huggingface.co", "priority": 2, "name": "HuggingFace"} ] ``` Lower priority = checked first. ### Per-Namespace Sources **User/org-specific fallback:** ```json { "namespace": "my-team", "url": "https://team-hub.com", "priority": 1 } ``` Only applies when browsing `my-team/*` repos. ### Cache Tuning ```bash KOHAKU_HUB_FALLBACK_CACHE_TTL=600 # 10 minutes KOHAKU_HUB_FALLBACK_TIMEOUT=20 # 20 second timeout KOHAKU_HUB_FALLBACK_MAX_CONCURRENT=10 # 10 concurrent requests ``` --- ## API Reference **Admin Endpoints:** ``` POST /admin/api/fallback-sources # Create source GET /admin/api/fallback-sources # List sources GET /admin/api/fallback-sources/{id} # Get source PUT /admin/api/fallback-sources/{id} # Update source DELETE /admin/api/fallback-sources/{id} # Delete source GET /admin/api/fallback-sources/cache/stats # Cache stats DELETE /admin/api/fallback-sources/cache/clear # Clear cache ``` **Query Parameters:** ``` ?fallback=false # Disable fallback for this request ?fallback=true # Enable fallback (default) ``` --- ## Examples ### Example 1: Browse HuggingFace Models ```bash # View Stability AI's models curl http://localhost:28080/api/models?author=stabilityai # Returns local + HuggingFace models tagged with _source ``` ### Example 2: Download from HuggingFace ```python from huggingface_hub import hf_hub_download # Falls back to HuggingFace automatically model_path = hf_hub_download( repo_id="openai/whisper-tiny", filename="config.json" ) ``` ### Example 3: Federated KohakuHub **Connect company internal hub:** ```json { "url": "https://internal-hub.company.com", "source_type": "kohakuhub", "priority": 1, "token": "internal_token_here" } ``` Now you can browse internal repos + HuggingFace from one interface! --- ## Performance **Typical Response Times:** - **Cache Hit**: <100ms (instant) - **Cache Miss (HF)**: <2s (external API call) - **File Download**: 302 redirect (no proxy, full speed) **Cache Hit Rate:** - **Expected**: >80% after warmup - **Check**: Admin Portal → Fallback Sources → Cache Stats --- ## See Also - [Admin Portal Guide](./Admin.md#fallback-sources-management) - [API Documentation](./API.md) - [Deployment Guide](./deployment.md)