mirror of
https://github.com/KohakuBlueleaf/KohakuHub.git
synced 2026-03-09 07:12:07 -05:00
638 lines
16 KiB
Markdown
638 lines
16 KiB
Markdown
# KohakuHub Utility Scripts
|
|
|
|
This directory contains utility scripts for KohakuHub administration and maintenance.
|
|
|
|
## Deployment Setup
|
|
|
|
### Docker Compose Generator
|
|
|
|
Interactive tool to generate a customized `docker-compose.yml` based on your deployment preferences.
|
|
|
|
**Features:**
|
|
- ✅ Choose between built-in or external PostgreSQL
|
|
- ✅ Configure LakeFS to use PostgreSQL or SQLite
|
|
- ✅ Choose between built-in MinIO or external S3 storage
|
|
- ✅ Auto-generate secure secret keys
|
|
- ✅ Comprehensive configuration validation
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
# Interactive mode (asks questions)
|
|
python scripts/generate_docker_compose.py
|
|
|
|
# Generate a configuration template
|
|
python scripts/generate_docker_compose.py --generate-config
|
|
|
|
# Use configuration file (non-interactive)
|
|
python scripts/generate_docker_compose.py --config kohakuhub.conf
|
|
```
|
|
|
|
**Configuration Options:**
|
|
|
|
1. **PostgreSQL:**
|
|
- Built-in container (managed by docker-compose)
|
|
- External PostgreSQL (specify host, port, credentials)
|
|
- Default database name for hub-api: `kohakuhub`
|
|
|
|
2. **LakeFS Database:**
|
|
- Use PostgreSQL (recommended for production)
|
|
- Use local SQLite (simpler for development)
|
|
- Default database name: `lakefs` (separate from hub-api database)
|
|
- **Automatic database creation**: Both databases are created automatically when LakeFS starts
|
|
- Works with both built-in and external PostgreSQL
|
|
|
|
3. **S3 Storage:**
|
|
- Built-in MinIO container (self-hosted)
|
|
- External S3-compatible storage (AWS S3, CloudFlare R2, etc.)
|
|
|
|
4. **Security:**
|
|
- Auto-generated session secret key
|
|
- Auto-generated admin secret token
|
|
- Option to use custom secrets
|
|
|
|
5. **Network:**
|
|
- External Docker bridge network support
|
|
- Allows cross-compose communication with external PostgreSQL/S3 services
|
|
- Automatically added to hub-api and lakefs when using external services
|
|
|
|
**Example: Interactive Mode**
|
|
|
|
```
|
|
$ python scripts/generate_docker_compose.py
|
|
|
|
=============================================================
|
|
KohakuHub Docker Compose Generator
|
|
=============================================================
|
|
|
|
--- PostgreSQL Configuration ---
|
|
Use built-in PostgreSQL container? [Y/n]: y
|
|
PostgreSQL username [hub]:
|
|
PostgreSQL password [hubpass]:
|
|
PostgreSQL database name for hub-api [kohakuhub]:
|
|
|
|
--- LakeFS Database Configuration ---
|
|
Use PostgreSQL for LakeFS? (No = use local SQLite) [Y/n]: y
|
|
PostgreSQL database name for LakeFS [lakefs]:
|
|
|
|
--- S3 Storage Configuration ---
|
|
Use built-in MinIO container? [Y/n]: n
|
|
S3 endpoint URL: https://my-account.r2.cloudflarestorage.com
|
|
S3 access key: xxxxxxxxxxxx
|
|
S3 secret key: yyyyyyyyyyyy
|
|
S3 region [us-east-1]: auto
|
|
|
|
--- Security Configuration ---
|
|
Generated session secret: AbCdEf123456...
|
|
Use generated session secret? [Y/n]: y
|
|
|
|
Use same secret for admin token? [y/N]: n
|
|
Generated admin secret: XyZ789...
|
|
Use generated admin secret? [Y/n]: y
|
|
|
|
=============================================================
|
|
Generating docker-compose.yml...
|
|
=============================================================
|
|
|
|
✓ Successfully generated: docker-compose.yml
|
|
✓ Database initialization scripts will run automatically when LakeFS starts
|
|
- scripts/init-databases.sh
|
|
- scripts/lakefs-entrypoint.sh
|
|
|
|
Configuration Summary:
|
|
------------------------------------------------------------
|
|
PostgreSQL: Built-in
|
|
Hub-API Database: kohakuhub
|
|
LakeFS Database: lakefs
|
|
LakeFS Database Backend: PostgreSQL
|
|
S3 Storage: Custom S3
|
|
Endpoint: https://my-account.r2.cloudflarestorage.com
|
|
Session Secret: AbCdEf123456...
|
|
Admin Secret: XyZ789...
|
|
------------------------------------------------------------
|
|
|
|
Next steps:
|
|
1. Review the generated docker-compose.yml
|
|
2. Build frontend: npm run build --prefix ./src/kohaku-hub-ui
|
|
3. Start services: docker-compose up -d
|
|
|
|
Note: Databases will be created automatically on first startup:
|
|
- kohakuhub (hub-api)
|
|
- lakefs (LakeFS)
|
|
|
|
4. Access at: http://localhost:28080
|
|
```
|
|
|
|
**Example: Using Configuration File**
|
|
|
|
```bash
|
|
# Step 1: Generate template
|
|
$ python scripts/generate_docker_compose.py --generate-config
|
|
[OK] Generated configuration template: kohakuhub.conf
|
|
|
|
Edit this file with your settings, then run:
|
|
python scripts/generate_docker_compose.py --config kohakuhub.conf
|
|
|
|
# Step 2: Edit kohakuhub.conf with your settings
|
|
$ nano kohakuhub.conf # or use any text editor
|
|
|
|
# Step 3: Generate docker-compose.yml from config
|
|
$ python scripts/generate_docker_compose.py --config kohakuhub.conf
|
|
|
|
============================================================
|
|
KohakuHub Docker Compose Generator
|
|
============================================================
|
|
|
|
Loading configuration from: kohakuhub.conf
|
|
|
|
Loaded configuration:
|
|
PostgreSQL: External
|
|
Host: db.example.com:5432
|
|
Database: kohakuhub
|
|
LakeFS: PostgreSQL
|
|
Database: lakefs
|
|
S3: External S3
|
|
Endpoint: https://s3.example.com
|
|
|
|
============================================================
|
|
Generating docker-compose.yml...
|
|
============================================================
|
|
|
|
[OK] Successfully generated: docker-compose.yml
|
|
[OK] Database initialization scripts will run automatically when LakeFS starts
|
|
|
|
Configuration Summary:
|
|
------------------------------------------------------------
|
|
PostgreSQL: Custom
|
|
Host: db.example.com:5432
|
|
Hub-API Database: kohakuhub
|
|
LakeFS Database: lakefs
|
|
LakeFS Database Backend: PostgreSQL
|
|
S3 Storage: Custom S3
|
|
Endpoint: https://s3.example.com
|
|
Session Secret: AbCdEf123456...
|
|
Admin Secret: XyZ789...
|
|
------------------------------------------------------------
|
|
```
|
|
|
|
**Requirements:**
|
|
- Python 3.10+
|
|
- No additional dependencies required
|
|
|
|
**Configuration File Format:**
|
|
|
|
The configuration file (`kohakuhub.conf`) uses INI format:
|
|
|
|
```ini
|
|
[postgresql]
|
|
builtin = true
|
|
user = hub
|
|
password = hubpass
|
|
database = kohakuhub
|
|
|
|
[lakefs]
|
|
use_postgres = true
|
|
database = lakefs
|
|
|
|
[s3]
|
|
builtin = true
|
|
access_key = minioadmin
|
|
secret_key = minioadmin
|
|
|
|
[security]
|
|
session_secret = your-secret-here
|
|
admin_secret = your-admin-secret
|
|
|
|
[network]
|
|
# Optional: for cross-compose communication
|
|
external_network = shared-network
|
|
```
|
|
|
|
For external PostgreSQL or S3:
|
|
```ini
|
|
[postgresql]
|
|
builtin = false
|
|
host = your-postgres-host.com
|
|
port = 5432
|
|
user = hub
|
|
password = your-password
|
|
database = kohakuhub
|
|
|
|
[s3]
|
|
builtin = false
|
|
endpoint = https://your-s3-endpoint.com
|
|
access_key = your-access-key
|
|
secret_key = your-secret-key
|
|
region = us-east-1
|
|
|
|
[network]
|
|
# Required if external services are in different Docker Compose
|
|
external_network = shared-network
|
|
```
|
|
|
|
**Using External Docker Network:**
|
|
|
|
If PostgreSQL or S3 are in separate Docker Compose setups, you need a shared network:
|
|
|
|
```bash
|
|
# Create the shared network first
|
|
docker network create shared-network
|
|
|
|
# Your PostgreSQL docker-compose.yml
|
|
services:
|
|
postgres:
|
|
# ... your config
|
|
networks:
|
|
- shared-network
|
|
|
|
networks:
|
|
shared-network:
|
|
external: true
|
|
|
|
# Generate KohakuHub with external network
|
|
python scripts/generate_docker_compose.py --config kohakuhub.conf
|
|
```
|
|
|
|
The generator will automatically:
|
|
- Add the external network to `hub-api` and `lakefs` services
|
|
- Configure them to use both `default` (hub-net) and the external network
|
|
- Allow container name resolution across compose files
|
|
|
|
**Important Notes:**
|
|
- Shell scripts automatically use LF line endings (configured in `.gitattributes`)
|
|
- Database initialization runs automatically on LakeFS startup
|
|
- Works on both Windows and Linux development environments
|
|
- Configuration file (`kohakuhub.conf`) is ignored by git (contains sensitive data)
|
|
- Use `--generate-config` to create a template configuration file
|
|
|
|
---
|
|
|
|
## Storage Management
|
|
|
|
### Clear S3 Storage
|
|
|
|
For demo deployments with constrained S3 storage (like CloudFlare R2 free tier), use these scripts to manage storage.
|
|
|
|
**All scripts now accept S3 credentials via command-line arguments or environment variables, making them standalone tools independent of KohakuHub configuration.**
|
|
|
|
#### Python Script (Recommended)
|
|
|
|
**Features:**
|
|
- ✅ Rich progress display with progress bars
|
|
- ✅ Detailed statistics (object count, total size)
|
|
- ✅ Dry run mode
|
|
- ✅ Batch deletion (handles large buckets efficiently)
|
|
- ✅ Prefix filtering
|
|
- ✅ Error reporting
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
# Clear all content (interactive confirmation required)
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket
|
|
|
|
# Clear only LFS files (large files)
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket \
|
|
--prefix lfs/
|
|
|
|
# Clear specific repository type
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket \
|
|
--prefix hf-model-
|
|
|
|
# Clear multiple prefixes
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket \
|
|
--prefix lfs/ --prefix hf-model-
|
|
|
|
# Dry run (show what would be deleted without deleting)
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket \
|
|
--dry-run
|
|
|
|
# Force delete without confirmation (dangerous!)
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket \
|
|
--force
|
|
|
|
# Use environment variables for credentials
|
|
export S3_ENDPOINT=https://s3.amazonaws.com
|
|
export S3_ACCESS_KEY=YOUR_ACCESS_KEY
|
|
export S3_SECRET_KEY=YOUR_SECRET_KEY
|
|
export S3_BUCKET=my-bucket
|
|
python scripts/clear_s3_storage.py --prefix lfs/
|
|
|
|
# Limit number of objects (for testing)
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_ACCESS_KEY \
|
|
--secret-key YOUR_SECRET_KEY \
|
|
--bucket my-bucket \
|
|
--max-objects 100
|
|
```
|
|
|
|
**Requirements:**
|
|
- Python 3.10+
|
|
- `boto3` and `rich` packages (`pip install boto3 rich`)
|
|
- S3 credentials with delete permissions
|
|
|
|
---
|
|
|
|
#### Shell Script (Alternative)
|
|
|
|
**Features:**
|
|
- ✅ No Python dependencies (uses AWS CLI or MinIO client)
|
|
- ✅ Simple and portable
|
|
- ✅ Supports both `aws` CLI and `mc` (MinIO client)
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
# Make script executable
|
|
chmod +x scripts/clear_s3_storage.sh
|
|
|
|
# Clear all content
|
|
./scripts/clear_s3_storage.sh
|
|
|
|
# Clear with prefix
|
|
./scripts/clear_s3_storage.sh --prefix lfs/
|
|
|
|
# Dry run
|
|
./scripts/clear_s3_storage.sh --dry-run
|
|
|
|
# Force delete
|
|
./scripts/clear_s3_storage.sh --force
|
|
```
|
|
|
|
**Requirements:**
|
|
- AWS CLI (`aws`) or MinIO Client (`mc`)
|
|
- KohakuHub environment variables configured
|
|
|
|
**Install AWS CLI:**
|
|
```bash
|
|
# Linux/macOS
|
|
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
|
|
unzip awscliv2.zip
|
|
sudo ./aws/install
|
|
|
|
# Or via package manager
|
|
sudo apt install awscli # Ubuntu/Debian
|
|
brew install awscli # macOS
|
|
```
|
|
|
|
**Install MinIO Client:**
|
|
```bash
|
|
# Linux
|
|
wget https://dl.min.io/client/mc/release/linux-amd64/mc
|
|
chmod +x mc
|
|
sudo mv mc /usr/local/bin/
|
|
|
|
# macOS
|
|
brew install minio/stable/mc
|
|
|
|
# Windows
|
|
choco install minio-client
|
|
```
|
|
|
|
---
|
|
|
|
## Common Prefixes in KohakuHub
|
|
|
|
Understanding KohakuHub's S3 storage structure:
|
|
|
|
| Prefix | Description | Example |
|
|
|--------|-------------|---------|
|
|
| `lfs/` | Large File Storage objects (files >5MB) | `lfs/ab/cd/abcd1234...` |
|
|
| `hf-model-*` | Model repository data | `hf-model-myuser-mymodel/` |
|
|
| `hf-dataset-*` | Dataset repository data | `hf-dataset-myuser-mydataset/` |
|
|
| `hf-space-*` | Space repository data | `hf-space-myuser-myspace/` |
|
|
|
|
**Note:** LFS objects are deduplicated by SHA256 hash. Deleting from `lfs/` prefix removes all large files across all repositories.
|
|
|
|
---
|
|
|
|
## Database Migration Scripts
|
|
|
|
### Add Storage Quota Fields
|
|
|
|
```bash
|
|
# Add quota tracking to users/organizations
|
|
python scripts/migrate_add_storage_quota.py
|
|
python scripts/migrate_separate_quotas.py
|
|
python scripts/migrate_add_quotas_final.py
|
|
```
|
|
|
|
### Update Repository Schema
|
|
|
|
```bash
|
|
python scripts/migrate_repository_schema.py
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Scripts
|
|
|
|
### Generate Test Files
|
|
|
|
```bash
|
|
# Generate test files for upload testing
|
|
python scripts/generate_test_files.py
|
|
```
|
|
|
|
### Test Authentication
|
|
|
|
```bash
|
|
# Test authentication flow
|
|
python scripts/test_auth.py
|
|
```
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
### Generate Secret Keys
|
|
|
|
```bash
|
|
# Generate secure random secret for session/admin tokens
|
|
python scripts/generate_secret.py
|
|
```
|
|
|
|
**Output:**
|
|
```
|
|
Generated secure random secret (64 characters):
|
|
a1b2c3d4e5f6...
|
|
|
|
Add this to your docker-compose.yml:
|
|
KOHAKU_HUB_SESSION_SECRET=a1b2c3d4e5f6...
|
|
KOHAKU_HUB_ADMIN_SECRET_TOKEN=a1b2c3d4e5f6...
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
### S3 Storage Scripts (`clear_s3_storage.py`, `show_s3_usage.py`)
|
|
|
|
These scripts accept credentials via command-line arguments or environment variables:
|
|
|
|
```bash
|
|
# Option 1: Command-line arguments
|
|
python scripts/clear_s3_storage.py \
|
|
--endpoint https://s3.amazonaws.com \
|
|
--access-key YOUR_KEY \
|
|
--secret-key YOUR_SECRET \
|
|
--bucket my-bucket
|
|
|
|
# Option 2: Environment variables
|
|
export S3_ENDPOINT=https://s3.amazonaws.com
|
|
export S3_ACCESS_KEY=YOUR_ACCESS_KEY
|
|
export S3_SECRET_KEY=YOUR_SECRET_KEY
|
|
export S3_BUCKET=my-bucket
|
|
export S3_REGION=us-east-1 # Optional, default: us-east-1
|
|
|
|
python scripts/clear_s3_storage.py --prefix lfs/
|
|
```
|
|
|
|
### Migration Scripts
|
|
|
|
Migration scripts require full KohakuHub configuration:
|
|
|
|
```bash
|
|
# S3 Storage
|
|
export KOHAKU_HUB_S3_ENDPOINT=http://localhost:29001
|
|
export KOHAKU_HUB_S3_PUBLIC_ENDPOINT=http://localhost:29001
|
|
export KOHAKU_HUB_S3_ACCESS_KEY=minioadmin
|
|
export KOHAKU_HUB_S3_SECRET_KEY=minioadmin
|
|
export KOHAKU_HUB_S3_BUCKET=hub-storage
|
|
|
|
# LakeFS
|
|
export KOHAKU_HUB_LAKEFS_ENDPOINT=http://localhost:28000
|
|
export KOHAKU_HUB_LAKEFS_ACCESS_KEY=...
|
|
export KOHAKU_HUB_LAKEFS_SECRET_KEY=...
|
|
|
|
# Database
|
|
export KOHAKU_HUB_DATABASE_URL=postgresql://user:pass@localhost:5432/kohakuhub
|
|
|
|
# Application
|
|
export KOHAKU_HUB_BASE_URL=http://localhost:28080
|
|
```
|
|
|
|
---
|
|
|
|
## JSON Schema Generator
|
|
|
|
### Generate JSON Schema
|
|
|
|
```bash
|
|
python scripts/generate_json_schema.py
|
|
```
|
|
|
|
This script will generate a JSON schema for the KohakuHub types in `__generated__/schemas/<filename>.json`.
|
|
|
|
Currently supported files:
|
|
- `config.json`
|
|
|
|
---
|
|
|
|
## Tips for Demo Deployments
|
|
|
|
### CloudFlare R2 Free Tier
|
|
|
|
- **Limit:** 10GB storage
|
|
- **Strategy:** Regularly clear LFS files and old repository data
|
|
|
|
```bash
|
|
# Weekly cleanup schedule
|
|
# 1. Clear old LFS files
|
|
python scripts/clear_s3_storage.py --prefix lfs/ --force
|
|
|
|
# 2. Clear test repositories
|
|
python scripts/clear_s3_storage.py --prefix hf-space-test --force
|
|
```
|
|
|
|
### MinIO Development
|
|
|
|
- **No limits:** MinIO has no storage limits
|
|
- **Cleanup:** Only needed for testing or reset
|
|
|
|
```bash
|
|
# Reset entire storage for clean slate
|
|
python scripts/clear_s3_storage.py --force
|
|
```
|
|
|
|
---
|
|
|
|
## Safety Features
|
|
|
|
All deletion scripts include:
|
|
|
|
1. **Dry run mode:** See what would be deleted before committing
|
|
2. **Interactive confirmation:** Requires explicit "yes" to proceed
|
|
3. **Double confirmation:** Extra prompt for full bucket deletion
|
|
4. **Progress display:** See deletion progress in real-time
|
|
5. **Error reporting:** Track and report any deletion failures
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "NoSuchBucket" Error
|
|
|
|
**Problem:** Bucket doesn't exist
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Create bucket using AWS CLI
|
|
aws s3 mb "s3://${KOHAKU_HUB_S3_BUCKET}" --endpoint-url="${KOHAKU_HUB_S3_ENDPOINT}"
|
|
|
|
# Or using MinIO client
|
|
mc mb kohakuhub-temp/${KOHAKU_HUB_S3_BUCKET}
|
|
```
|
|
|
|
### "Access Denied" Error
|
|
|
|
**Problem:** S3 credentials lack delete permissions
|
|
|
|
**Solution:** Check your S3 access key has `s3:DeleteObject` permission
|
|
|
|
### Script Hangs
|
|
|
|
**Problem:** Large bucket (millions of objects)
|
|
|
|
**Solution:** Use `--max-objects` to test first:
|
|
```bash
|
|
python scripts/clear_s3_storage.py --max-objects 1000 --dry-run
|
|
```
|
|
|
|
---
|
|
|
|
## Contributing
|
|
|
|
When adding new scripts:
|
|
|
|
1. Add docstring with usage examples
|
|
2. Include error handling
|
|
3. Support dry-run mode if applicable
|
|
4. Update this README
|
|
5. Test with both SQLite and PostgreSQL backends
|