Files
KohakuHub/docs/Admin.md
2025-10-22 02:42:35 +08:00

36 KiB

Admin Portal Guide

Complete guide to KohakuHub's administration interface

Last Updated: January 2025 Access: http://your-hub.com/admin


Table of Contents

  1. Overview
  2. Authentication
  3. Dashboard
  4. User Management
  5. Repository Management
  6. Commit History Viewer
  7. S3 Storage Browser
  8. Quota Management
  9. Invitation Management
  10. API Reference
  11. Security Best Practices

Overview

The Admin Portal provides a centralized interface for managing your KohakuHub instance. It offers:

  • User Management - Create, view, and delete users
  • Repository Browser - View all repositories with statistics
  • Commit History - Track commits across all repositories
  • Storage Browser - Browse S3 buckets and objects
  • Quota Management - Set and monitor storage quotas
  • Invitation Management - Create and manage registration invitations
  • Statistics Dashboard - Real-time insights into usage
  • Bulk Operations - Recalculate storage for all repositories

Access URL:

http://your-hub.com/admin

Authentication

Admin Token

The admin portal requires a secret token configured in your environment:

Configuration:

# docker-compose.yml
environment:
  KOHAKU_HUB_ADMIN_ENABLED: "true"
  KOHAKU_HUB_ADMIN_SECRET_TOKEN: "your-secret-token-here"  # CHANGE THIS!

Security:

  • ⚠️ NEVER use default token "change-me-in-production" in production
  • Generate strong random token: openssl rand -hex 32
  • Store securely (environment variable, secrets manager)
  • Rotate regularly
  • Use HTTPS in production

Login

  1. Navigate to /admin
  2. Enter your admin secret token
  3. Token is stored in browser session (not localStorage for security)
  4. Auto-logout on browser close

Example:

# Generate secure token
openssl rand -hex 32
# Output: a1b2c3d4e5f6...

# Add to docker-compose.yml
KOHAKU_HUB_ADMIN_SECRET_TOKEN: "a1b2c3d4e5f6..."

# Restart
docker-compose up -d

Dashboard

Overview Statistics

The dashboard shows real-time statistics from your database:

User Stats:

  • Total users
  • Active users
  • Email verified users
  • Inactive users

Organization Stats:

  • Total organizations

Repository Stats:

  • Total repositories
  • Private vs public repositories
  • Breakdown by type (models, datasets, spaces)

Commit Stats:

  • Total commits
  • Top contributors (by commit count)

Storage Stats:

  • Total storage used (private + public)
  • Private vs public storage
  • LFS object count and size

Quick Actions:

  • Navigate to user management
  • Browse repositories
  • View commits
  • Inspect S3 storage
  • Manage quotas
  • Manage invitations

User Management

List Users

Features:

  • View all users with pagination
  • Sort by ID, username, storage usage
  • Filter and search
  • Storage quota visualization

Columns:

  • ID, Username, Email
  • Private storage (used/quota)
  • Public storage (used/quota)
  • Total storage
  • Email verification status
  • Active status
  • Created date

Create User

Fields:

  • Username (required, unique)
  • Email (required, unique)
  • Password (required)
  • Email verified (checkbox)
  • Is active (checkbox)
  • Private quota (bytes, optional = unlimited)
  • Public quota (bytes, optional = unlimited)

Example:

Username: alice
Email: alice@example.com
Password: ********
Email Verified: ✓
Is Active: ✓
Private Quota: 10737418240  (10 GB)
Public Quota: 53687091200   (50 GB)

API Endpoint:

curl -X POST http://localhost:48888/admin/api/users \
  -H "X-Admin-Token: your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "alice",
    "email": "alice@example.com",
    "password": "secure_password",
    "email_verified": true,
    "is_active": true,
    "private_quota_bytes": 10737418240,
    "public_quota_bytes": 53687091200
  }'

View User Details

Click "View" to see:

  • User ID, username, email
  • Verification and active status
  • Storage quotas (private, public)
  • Storage used (private, public)
  • Created date

Actions:

  • Manage Quota (navigate to quota page)

Delete User

Normal Delete:

  • Deletes user account
  • Deletes all sessions and tokens
  • Deletes organization memberships
  • Keeps repositories (must delete separately)

Force Delete:

  • Deletes everything above
  • Also deletes all owned repositories
  • ⚠️ Cannot be undone!

Workflow:

  1. Click "Delete" → Confirmation dialog
  2. If user owns repos → Shows repo list
  3. Choose: Cancel or Force Delete
  4. Confirm force delete → All data deleted

API Endpoint:

# Normal delete (fails if user owns repos)
curl -X DELETE http://localhost:48888/admin/api/users/alice \
  -H "X-Admin-Token: your-secret-token"

# Force delete (deletes user and all their repos)
curl -X DELETE "http://localhost:48888/admin/api/users/alice?force=true" \
  -H "X-Admin-Token: your-secret-token"

Toggle Email Verification

Use case: Manually verify users when email verification is disabled or failed.

Action: Click "Verify" or "Unverify" button → Instant update

API Endpoint:

curl -X PATCH http://localhost:48888/admin/api/users/alice/email-verification?verified=true \
  -H "X-Admin-Token: your-secret-token"

Repository Management

List Repositories

Filters:

  • Repository type (model/dataset/space)
  • Namespace (user or organization)

Columns:

  • ID
  • Type (color-coded badge)
  • Full repository ID (namespace/name)
  • Privacy status (Private/Public badge)
  • Owner username
  • Storage quota and usage
  • Created date

Actions:

  • View Details → Opens detailed dialog

Repository Details

Information:

  • ID, Type, Full ID
  • Namespace, Name
  • Owner username
  • Privacy status
  • Created date
  • File count (from database, active files only)
  • Commit count (from database)
  • Total size (sum of all active files)
  • Quota information (quota, used, percentage, inheriting status)

Actions:

  • View in Main App → Opens repository in main UI

API Endpoint:

curl http://localhost:48888/admin/api/repositories/model/org/my-model \
  -H "X-Admin-Token: your-secret-token"

Commit History Viewer

Overview

View all commits across all repositories in your instance.

Filters:

  • Repository ID (e.g., "org/repo-name")
  • Author username

Columns:

  • Commit ID (first 8 chars)
  • Repository (type badge + full ID)
  • Branch
  • Author
  • Message (truncated, hover for full)
  • Created date

Sorting:

  • Sort by ID, created date, username, repository

Pagination:

  • Page size: 10, 20, 50, 100
  • Navigate through pages

API Endpoint:

# List all commits
curl http://localhost:48888/admin/api/commits?limit=100 \
  -H "X-Admin-Token: your-secret-token"

# Filter by repository
curl "http://localhost:48888/admin/api/commits?repo_full_id=org/model&limit=50" \
  -H "X-Admin-Token: your-secret-token"

# Filter by author
curl "http://localhost:48888/admin/api/commits?username=alice&limit=50" \
  -H "X-Admin-Token: your-secret-token"

Use Cases

  • Track user activity
  • Find specific commits
  • Monitor repository changes
  • Debug commit issues
  • Audit trail

S3 Storage Browser

Bucket List

Overview:

  • View all S3 buckets
  • Total size and object count
  • Visual progress bars
  • Creation dates

Metrics:

  • Bucket name
  • Total size (formatted: KB, MB, GB, TB)
  • Object count
  • Creation date
  • Progress bar (relative to 100GB)

Actions:

  • Click bucket → Browse contents

API Endpoint:

curl http://localhost:48888/admin/api/storage/buckets \
  -H "X-Admin-Token: your-secret-token"

Response:

{
  "buckets": [
    {
      "name": "hub-storage",
      "creation_date": "2025-01-01T00:00:00Z",
      "total_size": 107374182400,
      "object_count": 5000
    }
  ]
}

Object Browser

Features:

  • List objects in selected bucket
  • Filter by prefix (e.g., "lfs/", "models/")
  • Pagination (up to 1000 objects)

Columns:

  • Key (full S3 path)
  • Size
  • Storage class (STANDARD, etc.)
  • Last modified date

Prefix Filtering:

Enter prefix: lfs/
→ Shows only objects starting with "lfs/"

Enter prefix: hf-model-org-repo/
→ Shows objects for specific repository

API Endpoint:

# List objects in bucket
curl "http://localhost:48888/admin/api/storage/objects/hub-storage?prefix=lfs/&limit=100" \
  -H "X-Admin-Token: your-secret-token"

Quota Management

View Quota

Per-user or per-organization:

  • Private quota (limit)
  • Private used
  • Public quota (limit)
  • Public used
  • Total usage
  • Usage percentages

API Endpoint:

# Get user quota
curl "http://localhost:48888/admin/api/quota/alice?is_org=false" \
  -H "X-Admin-Token: your-secret-token"

# Get organization quota
curl "http://localhost:48888/admin/api/quota/my-org?is_org=true" \
  -H "X-Admin-Token: your-secret-token"

Response:

{
  "namespace": "alice",
  "is_organization": false,
  "private_quota_bytes": 10737418240,
  "public_quota_bytes": 53687091200,
  "private_used_bytes": 1234567890,
  "public_used_bytes": 5678901234,
  "private_available_bytes": 9502850350,
  "public_available_bytes": 47008189966,
  "private_percentage_used": 11.5,
  "public_percentage_used": 10.6,
  "total_used_bytes": 6913469124
}

Set Quota

Fields:

  • Private quota bytes (null = unlimited)
  • Public quota bytes (null = unlimited)

Examples:

10 GB = 10737418240 bytes
50 GB = 53687091200 bytes
Unlimited = (empty/null)

API Endpoint:

curl -X PUT http://localhost:48888/admin/api/quota/alice \
  -H "X-Admin-Token: your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{
    "private_quota_bytes": 10737418240,
    "public_quota_bytes": 53687091200
  }'

Recalculate Storage

Purpose: Re-scan all files and update storage usage.

When to use:

  • Database out of sync
  • After manual S3 operations
  • Quota shows incorrect values

Process:

  1. Scans all files for namespace
  2. Sums file sizes (private and public separately)
  3. Updates User/Organization table

API Endpoint:

curl -X POST "http://localhost:48888/admin/api/quota/alice/recalculate?is_org=false" \
  -H "X-Admin-Token: your-secret-token"

Bulk Storage Recalculation

NEW: Recalculate storage for all repositories at once.

API Endpoint:

# Recalculate all repositories
curl -X POST http://localhost:48888/admin/api/repositories/recalculate-all \
  -H "X-Admin-Token: your-secret-token"

# Filter by type
curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?repo_type=model" \
  -H "X-Admin-Token: your-secret-token"

# Filter by namespace
curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?namespace=org" \
  -H "X-Admin-Token: your-secret-token"

Response:

{
  "total": 250,
  "success_count": 248,
  "failure_count": 2,
  "failures": [
    {
      "repo_id": "org/problem-repo",
      "error": "Repository not found in LakeFS"
    }
  ],
  "message": "Recalculated storage for 248/250 repositories"
}

Invitation Management

Create Registration Invitation

Purpose: Generate invitations for user registration (useful for invite-only mode).

Features:

  • Optional organization membership after registration
  • Reusable invitations with usage limits
  • Configurable expiration

API Endpoint:

curl -X POST http://localhost:48888/admin/api/invitations/register \
  -H "X-Admin-Token: your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{
    "org_id": null,
    "role": "member",
    "max_usage": 10,
    "expires_days": 30
  }'

Response:

{
  "success": true,
  "token": "abc123xyz...",
  "invitation_link": "http://your-hub.com/register?invitation=abc123xyz...",
  "expires_at": "2025-02-14T12:00:00Z",
  "max_usage": 10,
  "is_reusable": true,
  "action": "register_account"
}

Invitation Types:

  • One-time: max_usage: null - Single use invitation
  • Limited: max_usage: 10 - Can be used 10 times
  • Unlimited: max_usage: -1 - Unlimited uses

Auto-join Organization:

{
  "org_id": 5,
  "role": "member",
  "max_usage": 50,
  "expires_days": 90
}

Users who register with this invitation will automatically join the organization as members.

List All Invitations

API Endpoint:

# List all invitations
curl http://localhost:48888/admin/api/invitations \
  -H "X-Admin-Token: your-secret-token"

# Filter by action type
curl "http://localhost:48888/admin/api/invitations?action=register_account" \
  -H "X-Admin-Token: your-secret-token"

Response:

{
  "invitations": [
    {
      "id": 1,
      "token": "abc123...",
      "action": "register_account",
      "org_id": null,
      "org_name": null,
      "role": null,
      "email": null,
      "created_by": 1,
      "creator_username": "System",
      "created_at": "2025-01-15T12:00:00Z",
      "expires_at": "2025-02-15T12:00:00Z",
      "max_usage": 10,
      "usage_count": 5,
      "is_reusable": true,
      "is_available": true,
      "error_message": null,
      "used_at": null,
      "used_by": null
    }
  ],
  "limit": 100,
  "offset": 0
}

Delete Invitation

API Endpoint:

curl -X DELETE http://localhost:48888/admin/api/invitations/{token} \
  -H "X-Admin-Token: your-secret-token"

API Reference

Authentication

All admin API endpoints require X-Admin-Token header:

curl -H "X-Admin-Token: your-secret-token" \
  http://localhost:48888/admin/api/stats

Endpoints Overview

User Management:

GET    /admin/api/users                    # List users
GET    /admin/api/users/{username}         # Get user info
POST   /admin/api/users                    # Create user
DELETE /admin/api/users/{username}         # Delete user
PATCH  /admin/api/users/{username}/email-verification  # Set verification

Repository Management:

GET /admin/api/repositories                # List repositories
GET /admin/api/repositories/{type}/{namespace}/{name}  # Get details
POST /admin/api/repositories/recalculate-all           # Bulk storage recalc

Commit History:

GET /admin/api/commits                     # List commits

Storage:

GET /admin/api/storage/buckets             # List buckets
GET /admin/api/storage/objects/{bucket}    # List objects

Statistics:

GET /admin/api/stats                       # Basic stats
GET /admin/api/stats/detailed              # Detailed stats
GET /admin/api/stats/timeseries?days=30    # Time-series data
GET /admin/api/stats/top-repos?by=commits  # Top repositories

Quota:

GET  /admin/api/quota/{namespace}          # Get quota
PUT  /admin/api/quota/{namespace}          # Set quota
POST /admin/api/quota/{namespace}/recalculate  # Recalculate

Invitations:

POST   /admin/api/invitations/register     # Create registration invitation
GET    /admin/api/invitations              # List all invitations
DELETE /admin/api/invitations/{token}      # Delete invitation

Response Formats

User Info:

{
  "id": 1,
  "username": "alice",
  "email": "alice@example.com",
  "email_verified": true,
  "is_active": true,
  "private_quota_bytes": 10737418240,
  "public_quota_bytes": 53687091200,
  "private_used_bytes": 1234567,
  "public_used_bytes": 9876543,
  "created_at": "2025-01-01T00:00:00.000000Z"
}

Repository Info:

{
  "id": 42,
  "repo_type": "model",
  "namespace": "org",
  "name": "my-model",
  "full_id": "org/my-model",
  "private": false,
  "owner_id": 1,
  "owner_username": "alice",
  "created_at": "2025-01-01T00:00:00.000000Z",
  "file_count": 15,
  "commit_count": 8,
  "total_size": 12345678,
  "quota_bytes": null,
  "used_bytes": 12345678,
  "percentage_used": 0.12,
  "is_inheriting": true
}

Detailed Stats:

{
  "users": {
    "total": 100,
    "active": 95,
    "verified": 80,
    "inactive": 5
  },
  "organizations": {
    "total": 10
  },
  "repositories": {
    "total": 250,
    "private": 100,
    "public": 150,
    "by_type": {
      "model": 180,
      "dataset": 60,
      "space": 10
    }
  },
  "commits": {
    "total": 1500,
    "top_contributors": [
      {"username": "alice", "commit_count": 150},
      {"username": "bob", "commit_count": 120}
    ]
  },
  "lfs": {
    "total_objects": 500,
    "total_size": 107374182400
  },
  "storage": {
    "private_used": 10737418240,
    "public_used": 53687091200,
    "total_used": 64424509440
  }
}

Security Best Practices

Token Management

DO:

  • Generate cryptographically random tokens
  • Use environment variables (never hardcode)
  • Rotate tokens regularly (monthly)
  • Use HTTPS in production
  • Restrict admin portal access (firewall, VPN)

DON'T:

  • Use default token in production
  • Commit tokens to git
  • Share tokens via insecure channels
  • Use same token across environments
  • Store tokens in browser localStorage

Token Rotation

# 1. Generate new token
NEW_TOKEN=$(openssl rand -hex 32)

# 2. Update docker-compose.yml
KOHAKU_HUB_ADMIN_SECRET_TOKEN: "$NEW_TOKEN"

# 3. Restart services
docker-compose up -d

# 4. Update saved tokens in admin portal sessions

Network Security

Production Deployment:

# Restrict admin portal to specific IPs
location /admin {
    allow 192.168.1.0/24;  # Internal network
    allow 10.0.0.0/8;      # VPN
    deny all;

    # ... rest of config
}

Alternative: Basic Auth Layer

location /admin/api/ {
    auth_basic "Admin Area";
    auth_basic_user_file /etc/nginx/.htpasswd;

    # Then require X-Admin-Token header
    proxy_pass http://hub-api:48888;
}

Audit Logging

Admin operations are logged with [ADMIN] prefix:

[WARNING] [ADMIN] [07:05:55] Admin deleted user: testuser (deleted 5 repositories)
[INFO] [ADMIN] [07:06:12] Admin set quota for user alice: private=10737418240, public=53687091200
[WARNING] [ADMIN] [07:06:45] Admin created registration invitation (max_usage=10, expires=30d)

Monitor logs:

docker logs khub-hub-api | grep "\[ADMIN\]"

Use Cases

Scenario 1: New User Onboarding

1. Dashboard → Quick Actions → "Manage Users"
2. Click "Create User"
3. Fill form:
   - Username: newuser
   - Email: newuser@company.com
   - Password: (generate secure password)
   - Email Verified: ✓
   - Quotas: 10GB private, 50GB public
4. Click "Create User"
5. Share credentials with user

Scenario 2: Invite-Only Registration Mode

1. Dashboard → "Manage Invitations"
2. Click "Create Registration Invitation"
3. Configure:
   - Max Usage: 50 (for team)
   - Expires: 90 days
   - Auto-join Organization: my-company (as member)
4. Copy invitation link
5. Share link with team members
6. Monitor usage count

Scenario 3: Storage Cleanup

1. Dashboard → "Browse Storage"
2. Click on "hub-storage" bucket
3. Filter by prefix: "lfs/"
4. Review large objects
5. Identify unused LFS objects
6. (Manually delete via CLI/API if needed)

Scenario 4: User Investigation

1. Dashboard → "View Commits"
2. Filter by username: "suspicious-user"
3. Review commit activity
4. Click repository links to inspect content
5. If needed: Go to Users → Delete user (with force)

Scenario 5: Quota Enforcement

1. Dashboard → "Manage Quotas"
2. Select namespace (user or org)
3. View current usage
4. Set new limits if exceeded
5. Click "Recalculate" to verify
6. Monitor dashboard for compliance

Scenario 6: System Maintenance

1. Dashboard → "Bulk Operations"
2. Click "Recalculate All Repository Storage"
3. Optional: Filter by type or namespace
4. Confirm operation
5. Wait for completion (progress logged)
6. Review success/failure report

Troubleshooting

Can't Login

Problem: Invalid admin token Solution: Check KOHAKU_HUB_ADMIN_SECRET_TOKEN in docker-compose.yml matches your input


Problem: "Admin API is disabled" Solution: Set KOHAKU_HUB_ADMIN_ENABLED=true in environment


Statistics Not Updating

Problem: Stale data Solution: Click "Refresh Stats" button on dashboard


Storage Size Incorrect

Problem: Database out of sync with S3 Solution: Use "Recalculate" button in Quota Management or bulk recalculation endpoint


Can't Delete User

Problem: User owns repositories Solution: Either delete repos first, or use "Force Delete" option with force=true parameter


Advanced Features

Time-Series Statistics

API:

curl -H "X-Admin-Token: your-token" \
  "http://localhost:48888/admin/api/stats/timeseries?days=30"

Returns:

{
  "repositories_by_day": {
    "2025-01-01": {"model": 5, "dataset": 2, "space": 0},
    "2025-01-02": {"model": 3, "dataset": 1, "space": 1}
  },
  "commits_by_day": {
    "2025-01-01": 15,
    "2025-01-02": 20
  },
  "users_by_day": {
    "2025-01-01": 2,
    "2025-01-02": 1
  }
}

Use case: Build custom dashboards with charts

Top Repositories

By Commits:

curl -H "X-Admin-Token: your-token" \
  "http://localhost:48888/admin/api/stats/top-repos?by=commits&limit=10"

By Size:

curl -H "X-Admin-Token: your-token" \
  "http://localhost:48888/admin/api/stats/top-repos?by=size&limit=10"

Response:

{
  "top_repositories": [
    {
      "repo_full_id": "org/active-model",
      "repo_type": "model",
      "commit_count": 150,
      "private": false
    }
  ],
  "sorted_by": "commits"
}

Integration with CI/CD

Automated User Creation

import requests

admin_token = "your-admin-token"
base_url = "http://hub.example.com"

# Create user via API
response = requests.post(
    f"{base_url}/admin/api/users",
    headers={"X-Admin-Token": admin_token},
    json={
        "username": "ci-bot",
        "email": "ci@company.com",
        "password": "generated-password",
        "email_verified": True,
        "private_quota_bytes": 107374182400,  # 100 GB
        "public_quota_bytes": None,  # Unlimited
    }
)

user = response.json()
print(f"Created user: {user['username']} (ID: {user['id']})")

Bulk Invitation Generation

import requests

admin_token = "your-admin-token"
base_url = "http://hub.example.com"

# Create reusable invitation for 100 users
response = requests.post(
    f"{base_url}/admin/api/invitations/register",
    headers={"X-Admin-Token": admin_token},
    json={
        "org_id": 5,  # Auto-join org after registration
        "role": "member",
        "max_usage": 100,
        "expires_days": 90
    }
)

invitation = response.json()
print(f"Invitation link: {invitation['invitation_link']}")
print(f"Can be used {invitation['max_usage']} times")

Monitoring Script

import requests

admin_token = "your-admin-token"

# Get statistics
response = requests.get(
    "http://hub.example.com/admin/api/stats/detailed",
    headers={"X-Admin-Token": admin_token}
)

stats = response.json()

# Alert if storage > 80%
total_used = stats['storage']['total_used']
if total_used > 0.8 * (100 * 1000 * 1000 * 1000):  # 80GB
    print("WARNING: Storage usage high!")

# Alert if too many inactive users
if stats['users']['inactive'] > 10:
    print(f"WARNING: {stats['users']['inactive']} inactive users")

Performance Considerations

Database Queries

Admin operations run synchronous queries with db.atomic():

  • User listings: O(n) where n = total users
  • Repository stats: Aggregation queries with indexes
  • Commit history: Indexed by repository_id and username
  • Storage calculations: Aggregation over File table

Optimization:

  • Limit page size (default: 100, max: 1000)
  • Use filters to reduce result sets
  • Statistics are computed on-demand (cache in frontend if needed)

S3 Bucket Scanning

Warning: Scanning large buckets is slow!

# For bucket with 100,000 objects:
# - Scan time: 30-60 seconds
# - Uses pagination (1000 objects per request)

Recommendation:

  • Limit to specific prefixes when possible
  • Don't scan too frequently
  • Consider caching results for large buckets

Bulk Storage Recalculation

Performance:

  • Processes repositories sequentially (safe for database)
  • Progress logged every 10 repositories
  • Can take 1-5 minutes for 1000 repositories
  • Errors don't stop the process (logged and returned)

Use case:

  • Run during maintenance windows
  • Use filters to process subsets
  • Monitor logs for progress

Comparison: Admin Portal vs CLI

Feature Admin Portal kohub-cli Best For
User management GUI No Portal: Quick actions
Repository browser Full ⚠️ Limited Portal: Overview
CLI: Specific repos
Commit history Full No Portal only
Storage browser Full No Portal only
Quota management Full ⚠️ API only Portal: Visual
CLI: Scripting
Invitation management Full No Portal only
Statistics Dashboard No Portal only
Bulk operations Full No Portal only
Automation Manual Scripts Portal: Manual
CLI: Automation

Recommendation: Use portal for exploration/monitoring, API for automation.


Frequently Asked Questions

Q: Can I disable the admin portal? A: Yes, set KOHAKU_HUB_ADMIN_ENABLED=false

Q: Is the admin token different from user tokens? A: Yes, admin token is system-wide. User tokens are per-user.

Q: Can I create multiple admin users? A: No, admin portal uses shared secret token. For user-based admin, implement role system.

Q: Does deleting a user delete their repositories? A: No (unless force delete). Repositories can be transferred to another user.

Q: Can I access admin API without the portal UI? A: Yes, use curl/Python with X-Admin-Token header.

Q: Is audit logging enabled by default? A: Yes, all admin operations are logged with [ADMIN] prefix.

Q: How do I create reusable invitations? A: Set max_usage to a number (e.g., 50 for 50 uses) or -1 for unlimited.

Q: Can invitations auto-add users to organizations? A: Yes, set org_id and role in the invitation. Users will automatically join after registration.


Last Updated: January 2025 Version: 1.1 Status: Production Ready

External Source Fallback System

Browse repositories from HuggingFace or other KohakuHub instances when not found locally.


Overview

The fallback system allows KohakuHub to seamlessly access repositories, files, and user profiles from external sources (like HuggingFace.co) when they're not available locally. This enables:

  • Browsing HuggingFace repositories without manually importing them
  • Downloading files from external sources
  • Viewing user/org profiles from other hubs
  • Connecting multiple KohakuHub instances for federated browsing

Quick Start

1. Configure Fallback Source (Admin Portal)

Navigate to: Admin Portal → Fallback Sources

Add HuggingFace:

Name: HuggingFace
URL: https://huggingface.co
Source Type: huggingface
Priority: 1
Token: (optional - for private repos)
Namespace: (empty for global)
Enabled: ✓

2. Browse External Repositories

Visit any HuggingFace user/org:

http://localhost:28080/openai
http://localhost:28080/stabilityai

View external models/datasets:

http://localhost:28080/models/openai/whisper-tiny
http://localhost:28080/datasets/karpathy/fineweb-edu

Download files:

from huggingface_hub import hf_hub_download

# Falls back to HuggingFace automatically
hf_hub_download(
    repo_id="openai/whisper-tiny",
    filename="model.bin"
)

How It Works

Architecture

User Request → KohakuHub
  ↓
  Check Local Database
  ↓
  Not Found (404)
  ↓
  Try Fallback Sources (by priority)
    1. HuggingFace
    2. Other KohakuHub Instance
    ...
  ↓
  Found! → Return with _source tag

Caching

Repository→Source mapping is cached (not content):

  • Cache TTL: 5 minutes (configurable)
  • Cache Key: {repo_type}:{namespace}/{name}
  • Cache Value: Source URL, name, type

This reduces external API calls by 80%+.


Configuration

Global Sources (Environment Variable)

In docker-compose.yml:

environment:
  KOHAKU_HUB_FALLBACK_ENABLED: "true"
  KOHAKU_HUB_FALLBACK_CACHE_TTL: "300"  # 5 minutes
  KOHAKU_HUB_FALLBACK_TIMEOUT: "10"     # 10 seconds
  KOHAKU_HUB_FALLBACK_SOURCES: |
    [
      {
        "url": "https://huggingface.co",
        "token": "",
        "priority": 1,
        "name": "HuggingFace",
        "source_type": "huggingface"
      }
    ]

Database Sources (via Admin API)

Add via API:

curl -X POST http://localhost:48888/admin/api/fallback-sources \
  -H "X-Admin-Token: your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "",
    "url": "https://huggingface.co",
    "name": "HuggingFace",
    "source_type": "huggingface",
    "priority": 1,
    "enabled": true
  }'

Or via Admin Portal:


Supported Operations

Repository Operations

Resolve/Download Files

  • GET /{type}s/{namespace}/{name}/resolve/{revision}/{path}
  • Returns 302 redirect to external URL

List Files (Tree)

  • GET /api/{type}s/{namespace}/{name}/tree/{revision}
  • Returns file tree from external source

Repository Info

  • GET /api/{type}s/{namespace}/{name}
  • Returns metadata from external source

Revision Info

  • GET /api/{type}s/{namespace}/{name}/revision/{revision}
  • Returns commit/branch info

User/Organization Operations

User Profile

  • GET /api/users/{username}/profile
  • Falls back to HF /api/users/{username}/overview

User Repositories

  • GET /api/users/{username}/repos
  • Aggregates from /api/models, /api/datasets, /api/spaces

Organization Profile

  • Detected via /api/organizations/{name}/members
  • Shows as organization page

List Aggregation

Repository Lists

  • GET /api/models?author={name}
  • Merges local + external results with _source tags

Disabled by Default On:

  • Homepage trending lists (?fallback=false)
  • Main browse pages (/models, /datasets, /spaces)

URL Mapping (HuggingFace)

HuggingFace has asymmetric URL patterns:

Operation KohakuHub HuggingFace
Models Download /models/{ns}/{name}/resolve/... /{ns}/{name}/resolve/...
Datasets Download /datasets/{ns}/{name}/resolve/... /datasets/{ns}/{name}/resolve/...
API Endpoints /api/{type}s/... /api/{type}s/...

The fallback client automatically handles these transformations.


External Source Indicators

Repository Pages

External repos show:

  • Badge in header: [☁️ External: https://huggingface.co]
  • Disabled commits tab (not available for external repos)
  • All metadata tagged with _source field

User/Org Pages

External profiles show:

  • Badge in profile card: [☁️ HuggingFace]
  • "Limited profile" indicator (bio/website may be missing)
  • All repos tagged with source

Query Parameters

Disable fallback per-request:

GET /api/models?fallback=false

Useful for:

  • Homepage (show local only)
  • Admin interfaces
  • Performance-critical lists

Admin Interface

Fallback Sources Management:

Access: http://localhost:28080/admin/fallback-sources

Features:

  • Add/Edit/Delete sources
  • Enable/Disable sources
  • Set priority order
  • View cache statistics
  • Clear cache manually

Cache Stats:

  • Current size
  • Max size (10,000 entries)
  • TTL (300 seconds default)
  • Usage percentage

Limitations

What Works

Browsing external repos (tree, files, metadata) Downloading files (302 redirect to external) Viewing user/org profiles Listing user's repositories YAML frontmatter metadata

What Doesn't Work

Commits - Not available for external repos Editing - Can't modify external repos Git Clone - Only local repos support Git clone LFS Upload - Can't upload to external sources Private Access - Requires admin-configured tokens (no user token passthrough)


Security

User Privacy:

  • Local user credentials are NEVER sent to external sources
  • Only admin-configured tokens are used
  • Public repos work without any tokens

Admin Token:

  • Configure once in admin portal
  • Used for all external requests
  • Can access private repos on external source (if token has permission)

Troubleshooting

External repos not showing:

  1. Check fallback sources in admin portal
  2. Verify source is enabled
  3. Check cache TTL (may need to wait or clear cache)
  4. Look for errors in backend logs

404 errors for external content:

  1. Verify the repo exists on the external source
  2. Check if source URL is correct
  3. Try clearing cache in admin portal

Performance issues:

  1. Check cache stats (should be >80% hit rate)
  2. Reduce number of external sources
  3. Increase cache TTL
  4. Use ?fallback=false for performance-critical pages

Advanced Configuration

Multiple Sources

Priority ordering:

[
  {"url": "https://your-hub.com", "priority": 1, "name": "Internal"},
  {"url": "https://huggingface.co", "priority": 2, "name": "HuggingFace"}
]

Lower priority = checked first.

Per-Namespace Sources

User/org-specific fallback:

{
  "namespace": "my-team",
  "url": "https://team-hub.com",
  "priority": 1
}

Only applies when browsing my-team/* repos.

Cache Tuning

KOHAKU_HUB_FALLBACK_CACHE_TTL=600       # 10 minutes
KOHAKU_HUB_FALLBACK_TIMEOUT=20          # 20 second timeout
KOHAKU_HUB_FALLBACK_MAX_CONCURRENT=10   # 10 concurrent requests

API Reference

Admin Endpoints:

POST   /admin/api/fallback-sources           # Create source
GET    /admin/api/fallback-sources           # List sources
GET    /admin/api/fallback-sources/{id}      # Get source
PUT    /admin/api/fallback-sources/{id}      # Update source
DELETE /admin/api/fallback-sources/{id}      # Delete source
GET    /admin/api/fallback-sources/cache/stats    # Cache stats
DELETE /admin/api/fallback-sources/cache/clear    # Clear cache

Query Parameters:

?fallback=false    # Disable fallback for this request
?fallback=true     # Enable fallback (default)

Examples

Example 1: Browse HuggingFace Models

# View Stability AI's models
curl http://localhost:28080/api/models?author=stabilityai

# Returns local + HuggingFace models tagged with _source

Example 2: Download from HuggingFace

from huggingface_hub import hf_hub_download

# Falls back to HuggingFace automatically
model_path = hf_hub_download(
    repo_id="openai/whisper-tiny",
    filename="config.json"
)

Example 3: Federated KohakuHub

Connect company internal hub:

{
  "url": "https://internal-hub.company.com",
  "source_type": "kohakuhub",
  "priority": 1,
  "token": "internal_token_here"
}

Now you can browse internal repos + HuggingFace from one interface!


Performance

Typical Response Times:

  • Cache Hit: <100ms (instant)
  • Cache Miss (HF): <2s (external API call)
  • File Download: 302 redirect (no proxy, full speed)

Cache Hit Rate:

  • Expected: >80% after warmup
  • Check: Admin Portal → Fallback Sources → Cache Stats

See Also