diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e780c2f..00a0f19 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -8,67 +8,120 @@ Thank you for your interest in contributing to KohakuHub! We welcome contributio - **GitHub Issues:** Bug reports and feature requests - **Roadmap:** See [Project Status](#project-status) below -## Getting Started +## Code Conventions and Rules -### Prerequisites +### Python Code Style -- Python 3.10+ -- Node.js 18+ -- Docker & Docker Compose -- Git +**Core Principles:** +1. **Minimal solution, but you can't skip anything.** If any implementation/target/goal are too difficult, discuss first. Don't silently ignore them. +2. **Modern Python:** Use match-case instead of nested if-else, utilize native type hints (use `list[]`, `dict[]` instead of importing from `typing` unless needed) +3. **Clean code:** Try to split large functions into smaller ones +4. **Type hints recommended but not required** - No static type checking, but use type hints for documentation -### Setup +**Import Order Rules:** +```python +# 1. builtin packages +import asyncio +import hashlib +from datetime import datetime -```bash -git clone https://github.com/KohakuBlueleaf/KohakuHub.git -cd KohakuHub +# 2. Third-party packages (alphabetical) +import bcrypt +from fastapi import APIRouter, Depends +from peewee import fn -# Backend -python -m venv .venv -source .venv/bin/activate # On Windows: .venv\Scripts\activate -pip install -e ".[dev]" +# 3. Our packages (shorter paths first, then alphabetical) +from kohakuhub.config import cfg +from kohakuhub.db import User +from kohakuhub.db_operations import get_repository +from kohakuhub.api.quota.util import get_storage_info +from kohakuhub.auth.dependencies import get_current_user -# Frontend -npm install --prefix ./src/kohaku-hub-ui - -# Start with Docker -cp docker-compose.example.yml docker-compose.yml -# IMPORTANT: Edit docker-compose.yml to change default passwords and secrets -./deploy.sh +# Within each group: +# - `import xxx` comes before `from xxx import` +# - Shorter paths before longer paths +# - Alphabetical order ``` -**Access:** http://localhost:28080 +**Type Hints - Use Native Types:** +```python +# ✅ Good - native types (Python 3.10+) +def process_data(items: list[str]) -> dict[str, int]: + results: dict[str, int] = {} + return results -## Code Style +# ❌ Avoid - importing from typing +from typing import List, Dict +def process_data(items: List[str]) -> Dict[str, int]: + pass +``` -### Backend (Python) +**Modern Python Patterns:** +```python +# ✅ Good - use match-case +match status: + case "active": + handle_active() + case "pending": + handle_pending() + case _: + handle_default() -Follow following principles: -- Modern Python (match-case, async/await, native types like `list[]`, `dict[]`) -- Import order: **builtin → 3rd party → ours**, then **shorter paths first**, then **alphabetical** - - `import os` before `from datetime import` - - `from kohakuhub.db import` before `from kohakuhub.auth.dependencies import` -- **Database operations:** Use synchronous Peewee ORM with `db.atomic()` for transactions (safe for multi-worker deployments) -- **NO imports in functions** (except to avoid circular imports) -- Use `asyncio.gather()` for parallel async operations (NOT sequential await in loops) -- Split large functions into smaller ones (especially match-case with >3 branches) +# ❌ Avoid - nested if-else +if status == "active": + handle_active() +elif status == "pending": + handle_pending() +else: + handle_default() + +# ✅ Good - native union syntax +def get_user(username: str) -> User | None: + return User.get_or_none(User.username == username) + +# ❌ Avoid - Optional from typing +from typing import Optional +def get_user(username: str) -> Optional[User]: + pass +``` + +**No imports in functions** (except to avoid circular imports): +```python +# ✅ Good - imports at top +from kohakuhub.db import User + +def process_user(user_id: int): + user = User.get_by_id(user_id) + return user + +# ❌ Avoid - imports in function +def process_user(user_id: int): + from kohakuhub.db import User + user = User.get_by_id(user_id) + return user +``` + +**Code formatting:** - Use `black` for code formatting -- Type hints recommended but not required (no static type checking) +- Line length: 100 characters (black default is 88, we use 100) +- Use `asyncio.gather()` for parallel async operations (NOT sequential await in loops) -#### File Structure Rules +### File Structure Rules **Global Infrastructure** (used by multiple features): ``` kohakuhub/ ├── utils/ # Global infrastructure │ ├── s3.py # S3 client wrapper -│ └── lakefs.py # LakeFS client wrapper +│ ├── lakefs.py # LakeFS client wrapper +│ └── names.py # Name normalization ├── auth/ # Cross-cutting concern (stays at root) │ ├── routes.py # Auth endpoints │ ├── dependencies.py # Used by ALL routers │ └── permissions.py # Used by ALL routers ├── config.py # Configuration ├── db.py # Database models (Peewee ORM - synchronous) +├── db_operations.py # Database operation wrappers ├── logger.py # Logging utilities └── lakefs_rest_client.py # LakeFS REST client ``` @@ -79,10 +132,14 @@ kohakuhub/ ``` api/ ├── admin.py # Admin portal endpoints +├── avatar.py # Avatar management ├── branches.py # Branch operations ├── files.py # File operations (large but no specific utils) +├── likes.py # Repository likes ├── misc.py # Misc utilities -└── settings.py # Settings endpoints +├── settings.py # Settings endpoints +├── stats.py # Statistics and trending +└── validation.py # Name validation ``` **Rule 2:** Feature with utils → `api//` @@ -94,6 +151,10 @@ api/org/ api/quota/ ├── router.py # Quota endpoints └── util.py # Quota calculations + +api/invitation/ +├── router.py # Invitation endpoints +└── util.py # Invitation utilities (if needed) ``` **Rule 3:** Complex feature (multiple routers) → `api//routers/` @@ -127,243 +188,39 @@ api/git/ 1. **No utils needed?** → Use Rule 1 (single file `api/xxx.py`) 2. **Needs utils?** → Use Rule 2 (folder `api/xxx/` with `router.py` + `util.py`) 3. **Multiple routers?** → Use Rule 3 (folder `api/xxx/routers/` + optional `utils/`) -4. **Utils used by EVERYONE?** → Put in root `utils/` (s3, lakefs) +4. **Utils used by EVERYONE?** → Put in root `utils/` (s3, lakefs, names) 5. **Utils used by multiple routers in same feature?** → Put in `api/xxx/utils/` **Router Import Pattern in `main.py`:** ```python # Rule 1 (single file exports router) -from kohakuhub.api import admin, branches, files +from kohakuhub.api import admin, avatar, branches, files, likes, misc, settings, stats, validation # Rule 2 (folder exports router) from kohakuhub.api.org import router as org from kohakuhub.api.quota import router as quota +from kohakuhub.api.invitation import router as invitation # Rule 3 (multiple routers) from kohakuhub.api.commit import router as commits, history as commit_history from kohakuhub.api.repo.routers import crud, info, tree +from kohakuhub.api.git.routers import http as git_http, lfs, ssh_keys # Usage in app.include_router(): app.include_router(admin.router, ...) # admin IS a module with .router -app.include_router(commits, ...) # commits IS the router (imported as router) +app.include_router(org, ...) # org IS the router (imported as router) app.include_router(commit_history.router, ...) # commit_history is a module ``` -### Frontend (Vue 3) - -Follow following principles: -- JavaScript only (no TypeScript), use JSDoc comments for type hints -- Vue 3 Composition API with ` +``` + +## Pull Request Process + +1. **Before submitting:** + - Update relevant documentation (API.md, CLI.md, etc.) + - Add tests for new functionality + - Ensure code follows style guidelines + - Test in both development and Docker deployment modes + - Run `black` on Python code + - Run `prettier` on frontend code + +2. **Submitting PR:** + - Create a clear, descriptive title + - Describe what changes were made and why + - Link related issues + - Include screenshots for UI changes + - List any breaking changes + - Request review from maintainers + +3. **After submission:** + - Address feedback promptly + - Keep PR focused (split large changes into multiple PRs) + - Rebase on main if needed + +## Project Status + +*Last Updated: January 2025* + +### ✅ Core Features (Complete) + +**API & Storage:** +- HuggingFace Hub API compatibility +- Git LFS protocol for large files +- File deduplication (SHA256) +- Repository management (create, delete, list, move/rename) +- Branch and tag management +- Commit history +- S3-compatible storage (MinIO, AWS S3, etc.) +- LakeFS versioning (branches, commits, diffs) - using REST API directly + +**Authentication:** +- User registration with email verification (optional) +- Session-based auth + API tokens +- Organization management with role-based access +- Permission system (namespace-based) +- SSH key management + +**Web UI:** +- Vue 3 interface with dark/light mode +- Repository browsing and file viewer +- Code editor (CodeMirror 6) with syntax highlighting +- Markdown rendering with Mermaid chart support +- Commit history viewer +- Settings pages (user, org, repo) +- Documentation viewer + +**Admin Portal:** +- User management (create, delete, email verification toggle) +- Repository browser with statistics +- Commit history viewer across all repositories +- S3 storage browser +- Quota management (users, organizations, repositories) +- System statistics dashboard +- Time-series analytics +- Invitation management + +**CLI Tool:** +- Full-featured `kohub-cli` with interactive TUI mode +- Repository, organization, user management +- Branch/tag operations +- File upload/download +- Commit history viewing +- LFS settings management +- Health check +- Operation history tracking +- Shell autocomplete (bash/zsh/fish) + +**Social Features:** +- Repository likes (similar to GitHub stars) +- Trending repositories (based on download activity) +- Download tracking and statistics +- Avatar management (users and organizations) + +**Quota System:** +- User and organization storage quotas (separate private/public) +- Repository-specific quotas +- Storage usage tracking +- Automatic quota enforcement +- Recalculation and sync tools + +**Invitations:** +- Organization invitations +- Registration invitations (for invite-only mode) +- Reusable invitations with usage limits +- Email notifications (optional) + +**Git Support:** +- Native Git clone support (pure Python implementation) +- Git LFS integration +- Automatic LFS pointers for large files (>1MB) +- Memory-efficient (no temp files) +- SSH key authentication support + +### 🚧 In Progress + +- Rate limiting +- Repository transfer between namespaces +- Search functionality +- Git push support + +### 📋 Planned Features + +**Advanced Features:** +- Pull requests / merge requests +- Discussion/comments +- Model/dataset card templates +- Automated model evaluation +- Multi-region CDN support +- Webhook system + +**UI Improvements:** +- Diff viewer for commits +- Image/media file preview +- Activity feed +- Branch/tag management UI + +**Testing & Quality:** +- Unit tests for API endpoints +- Integration tests for HF client +- E2E tests for web UI +- Performance/load testing + +## Development Areas + +We're especially looking for help in: + +### 🎨 Frontend (High Priority) +- Improving UI/UX +- Missing pages (diff viewer, activity feed) +- Mobile responsiveness +- Accessibility + +### 🔧 Backend +- Additional HuggingFace API compatibility +- Performance optimizations +- Advanced repository features +- Search functionality + +### 📚 Documentation +- Tutorial videos +- Architecture deep-dives +- Deployment guides +- API examples + +### 🧪 Testing +- Unit test coverage +- Integration tests +- E2E scenarios +- Load testing + ## Community - **Discord:** https://discord.gg/xWYrkyvJ2s @@ -522,4 +722,4 @@ By contributing, you agree to the following: --- -Thank you for contributing to KohakuHub! 🎉 +Thank you for contributing to KohakuHub! diff --git a/README.md b/README.md index 66a5f5b..469db77 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ -# Kohaku Hub - Self-hosted HuggingFace alternative +# Kohaku Hub - Self-hosted HuggingFace Alternative + ![](images/logo-banner-dark.svg) --- @@ -30,9 +31,12 @@ Self-hosted HuggingFace alternative with Git-like versioning for AI models and d - **S3 Storage** - Works with MinIO, AWS S3, Cloudflare R2, etc. - **Large File Support** - Git LFS protocol with automatic LFS pointers (>1MB files) - **Organizations** - Multi-user namespaces with role-based access +- **Quota Management** - Storage quotas for users and organizations - **Web UI** - Vue 3 interface with file browser, editor, commit history, Mermaid chart support -- **CLI Tool** - Full-featured command-line interface +- **Admin Portal** - Comprehensive admin interface for user and repository management +- **CLI Tool** - Full-featured command-line interface with interactive TUI mode - **File Deduplication** - Content-addressed storage by SHA256 +- **Trending & Likes** - Repository popularity tracking - **Pure Python Git Server** - No native dependencies, memory-efficient ## Quick Start @@ -159,8 +163,8 @@ See [docs/Git.md](./docs/Git.md) for complete Git clone documentation and implem - **Vue 3** - Modern web interface **Implementation Notes:** -- **LakeFS:** Uses REST API directly (not the deprecated lakefs-client Python library), providing pure async operations without thread pool overhead -- **Database:** Synchronous operations with Peewee ORM and `db.atomic()` for transaction safety. Supports multi-worker deployment (4-8 workers) for horizontal scaling. Future migration to peewee-async planned. +- **LakeFS:** Uses REST API directly (lakefs_rest_client.py), providing pure async operations +- **Database:** Synchronous operations with Peewee ORM and `db.atomic()` for transaction safety. Supports multi-worker deployment (4-8 workers) for horizontal scaling. **Data Flow:** 1. Small files (<10MB) → Base64 in commit payload @@ -190,6 +194,10 @@ KOHAKU_HUB_DATABASE_URL=postgresql://hub:pass@postgres:5432/hubdb # Auth KOHAKU_HUB_SESSION_SECRET=change-me-in-production KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION=false + +# Admin Portal +KOHAKU_HUB_ADMIN_ENABLED=true +KOHAKU_HUB_ADMIN_SECRET_TOKEN=change-me-in-production ``` See [config-example.toml](./config-example.toml) for all options. @@ -229,6 +237,8 @@ python scripts/test_auth.py - [docs/ports.md](./docs/ports.md) - Port configuration reference - [docs/API.md](./docs/API.md) - API endpoints and workflows - [docs/CLI.md](./docs/CLI.md) - Command-line tool usage +- [docs/Admin.md](./docs/Admin.md) - Admin portal guide +- [docs/Git.md](./docs/Git.md) - Git clone support - [CONTRIBUTING.md](./CONTRIBUTING.md) - Contributing guide & roadmap ## Security Notes @@ -236,6 +246,7 @@ python scripts/test_auth.py ⚠️ **Before Production:** - Change all default passwords in `docker-compose.yml` - Set secure `KOHAKU_HUB_SESSION_SECRET` +- Set secure `KOHAKU_HUB_ADMIN_SECRET_TOKEN` - Set secure `LAKEFS_AUTH_ENCRYPT_SECRET_KEY` - Use HTTPS with reverse proxy - Only expose port 28080 (Web UI) @@ -249,12 +260,14 @@ While core features are stable for alpha release, some advanced features are sti - Feel free to open issue in this case, but remember to provide full information and minimal reproduction! - LFS strategy is not yet configurable -See [CONTRIBUTING.md](./CONTRIBUTING.md#project-status) for full roadmap and [docs/TODO.md](./docs/TODO.md) for detailed status. +See [CONTRIBUTING.md](./CONTRIBUTING.md#project-status) for full roadmap. ## License AGPL-3.0 + **NOTE**: We may release some new features under non-commercial license. + **Commercial Exemption**: If you need any commercial exemption licenses (to not fully open source your system built upon KohakuHub), please contact kohaku@kblueleaf.net ## Support diff --git a/docs/API.md b/docs/API.md index fe174c4..ed519ec 100644 --- a/docs/API.md +++ b/docs/API.md @@ -2,38 +2,37 @@ *Last Updated: January 2025* -This document explains how Kohaku Hub's API works, the data flow, and key endpoints. +This document explains how Kohaku Hub's API works, the data flow, and all available endpoints. ## System Architecture ```mermaid graph TB - subgraph "Client Layer" - Client["Client
(huggingface_hub, git, browser)"] + subgraph Client["Client Layer"] + CLT["Client
(huggingface_hub, git, browser)"] end - subgraph "Entry Point" - Nginx["Nginx (Port 28080)
- Serves static files
- Reverse proxy"] + subgraph Entry["Entry Point"] + NGX["Nginx (Port 28080)
- Serves static files
- Reverse proxy"] end - subgraph "Application Layer" - FastAPI["FastAPI (Port 48888)
- Auth & Permissions
- HF-compatible API
- Git Smart HTTP"] + subgraph App["Application Layer"] + API["FastAPI (Port 48888)
- Auth & Permissions
- HF-compatible API
- Git Smart HTTP"] end - subgraph "Storage Backend" - LakeFS["LakeFS
- Git-like versioning
- Branch management
- Commit history"] + subgraph Storage["Storage Backend"] + LFS["LakeFS
- Git-like versioning
- Branch management
- Commit history"] DB["PostgreSQL/SQLite
- User data
- Metadata
- Deduplication
- Synchronous with db.atomic()"] S3["MinIO/S3
- Object storage
- LFS files
- Presigned URLs"] end - Client -->|HTTP/Git/LFS| Nginx - Nginx -->|Static files| Client - Nginx -->|/api, /org, resolve| FastAPI - FastAPI -->|REST API (async)| LakeFS - FastAPI -->|Sync queries with db.atomic()| DB - FastAPI -->|Async wrappers| S3 - LakeFS -->|Stores objects| S3 - + CLT -->|HTTP/Git/LFS| NGX + NGX -->|Static files| CLT + NGX -->|/api, /org, resolve| API + API -->|REST API async| LFS + API -->|Sync queries with db.atomic| DB + API -->|Async| S3 + LFS -->|Stores objects| S3 ``` ## Core Concepts @@ -42,7 +41,7 @@ graph TB ```mermaid graph TD - Start[File Upload] --> Check{File size > 5MB?} + Start[File Upload] --> Check{File size > 10MB?} Check -->|No| Regular[Regular Mode] Check -->|Yes| LFS[LFS Mode] Regular --> Base64[Base64 in commit payload] @@ -52,10 +51,9 @@ graph TD FastAPI --> LakeFS1[LakeFS stores object] Direct --> Link[FastAPI links S3 object] Link --> LakeFS2[LakeFS commit with physical address] - ``` -**Note:** The LFS threshold is configurable via `KOHAKU_HUB_LFS_THRESHOLD_BYTES` (default: 5MB = 5,242,880 bytes). +**Note:** The LFS threshold is configurable via `KOHAKU_HUB_LFS_THRESHOLD_BYTES` (default: 10MB = 10,000,000 bytes). Can also be set per-repository. ### Storage Layout @@ -75,57 +73,6 @@ s3://hub-storage/ └── abcd1234... ← Full SHA256 hash ``` -## Git Clone Support - -### Overview - -KohakuHub supports native Git clone operations using **pure Python implementation** (no pygit2/libgit2). - -**Git URL Format:** -``` -http://hub.example.com/{namespace}/{repo-name}.git -``` - -**Git Endpoints:** -- `GET /{namespace}/{name}.git/info/refs?service=git-upload-pack` - Service advertisement -- `POST /{namespace}/{name}.git/git-upload-pack` - Clone/fetch/pull -- `GET /{namespace}/{name}.git/HEAD` - Get HEAD reference -- `POST /{namespace}/{name}.git/git-receive-pack` - Push (in progress) - -### LFS Integration - -**Automatic LFS Pointers:** -- Files **<1MB**: Included in Git pack as regular blobs -- Files **>=1MB**: Converted to LFS pointers (100-byte text files) - -**LFS Pointer Format:** -``` -version https://git-lfs.github.com/spec/v1 -oid sha256:abc123... -size 10737418240 -``` - -**Client Workflow:** -```bash -# 1. Clone (gets pointers for large files) -git clone http://hub.example.com/org/repo.git - -# 2. Download large files via LFS -cd repo -git lfs install -git lfs pull # Uses existing /info/lfs/ endpoints -``` - -**Benefits:** -- Fast clones (only metadata + small files) -- No memory issues (LFS pointers are tiny) -- Leverages existing HuggingFace LFS infrastructure -- Pure Python (no native dependencies) - -See [Git.md](./Git.md) for complete Git clone documentation and implementation details. - ---- - ## Upload Workflow ### Overview @@ -142,12 +89,12 @@ sequenceDiagram API->>API: Check DB for existing SHA256 API-->>Client: Upload mode (regular/lfs) & dedup info - alt Small Files (<5MB) + alt Small Files (<10MB) Note over Client,S3: Phase 2a: Regular Upload Client->>API: POST /commit (base64 content) API->>LakeFS: Upload object LakeFS->>S3: Store object - else Large Files (>=5MB) + else Large Files (>=10MB) Note over Client,S3: Phase 2b: LFS Upload Client->>API: POST /info/lfs/objects/batch API->>S3: Generate presigned URL @@ -205,97 +152,7 @@ sequenceDiagram } ``` -**Decision Logic**: -``` -For each file: - 1. Check size: - - ≤ 5MB → "regular" - - > 5MB → "lfs" - - 2. Check if exists (deduplication): - - Query DB for matching SHA256 + size - - If match found → shouldIgnore: true - - If no match → shouldIgnore: false -``` - -### Step 2a: Regular Upload (≤5MB) - -Files are sent inline in the commit payload as base64. - -``` -┌────────┐ ┌────────┐ -│ Client │───── base64 ──────>│ Commit │ -└────────┘ (embedded) └────────┘ -``` - -**No separate upload step needed** - proceed directly to Step 3. - -### Step 2b: LFS Upload (>5MB) - -#### Phase 1: Request Upload URLs - -**Endpoint**: `POST /{repo_id}.git/info/lfs/objects/batch` - -**Request**: -```json -{ - "operation": "upload", - "transfers": ["basic", "multipart"], - "objects": [ - { - "oid": "sha256_hash", - "size": 52428800 - } - ] -} -``` - -**Response** (if file needs upload): -```json -{ - "transfer": "basic", - "objects": [ - { - "oid": "sha256_hash", - "size": 52428800, - "actions": { - "upload": { - "href": "https://s3.../presigned_url", - "expires_at": "2025-10-02T00:00:00Z" - } - } - } - ] -} -``` - -**Response** (if file already exists): -```json -{ - "transfer": "basic", - "objects": [ - { - "oid": "sha256_hash", - "size": 52428800 - // No "actions" field = already exists - } - ] -} -``` - -#### Phase 2: Upload to S3 - -``` -┌────────┐ ┌─────────┐ -│ Client │---- PUT file -------->│ S3 │ -└────────┘ (presigned URL) └─────────┘ - Direct upload lfs/ab/cd/ - (no proxy!) abcd123... -``` - -**Key Point**: Client uploads directly to S3 using the presigned URL. Kohaku Hub server is NOT involved in data transfer. - -### Step 3: Commit +### Step 2: Commit **Purpose**: Atomically commit all changes to the repository @@ -316,54 +173,12 @@ Files are sent inline in the commit payload as base64. | Key | Description | Usage | |-----|-------------|-------| | `header` | Commit metadata | Required, must be first line | -| `file` | Small file (inline base64) | For files ≤ 5MB | -| `lfsFile` | Large file (LFS reference) | For files > 5MB, already uploaded to S3 | +| `file` | Small file (inline base64) | For files ≤ 10MB | +| `lfsFile` | Large file (LFS reference) | For files > 10MB, already uploaded to S3 | | `deletedFile` | Delete a single file | Remove file from repo | | `deletedFolder` | Delete folder recursively | Remove all files in folder | | `copyFile` | Copy file within repo | Duplicate file (deduplication-aware) | -**Response**: -```json -{ - "commitUrl": "https://hub.example.com/repo/commit/abc123", - "commitOid": "abc123def456", - "pullRequestUrl": null -} -``` - -**What Happens**: -``` -1. Regular files: - ┌─────────┐ - │ Decode │ Base64 -> Binary - └────┬────┘ - | - v - ┌─────────┐ - │ Upload │ To LakeFS - └────┬────┘ - | - v - ┌─────────┐ - │ Update │ Database record - └─────────┘ - -2. LFS files: - ┌─────────┐ - │ Link │ S3 physical address -> LakeFS - └────┬────┘ - | - v - ┌─────────┐ - │ Update │ Database record - └─────────┘ - -3. Commit: - ┌─────────┐ - │ LakeFS │ Create commit with all changes - └─────────┘ -``` - ## Download Workflow ```mermaid @@ -390,216 +205,6 @@ sequenceDiagram Note over Client: No proxy - direct S3 download ``` -### Step 1: Get Metadata (HEAD) - -**Endpoint**: `HEAD /{repo_id}/resolve/{revision}/{filename}` - -**Response Headers**: -``` -X-Repo-Commit: abc123def456 -X-Linked-Etag: "sha256:abc123..." -X-Linked-Size: 52428800 -ETag: "abc123..." -Content-Length: 52428800 -Location: https://s3.../presigned_download_url -``` - -**Purpose**: Client checks if file needs re-download (by comparing ETag) - -### Step 2: Download (GET) - -**Endpoint**: `GET /{repo_id}/resolve/{revision}/{filename}` - -**Response**: HTTP 302 Redirect - -``` -HTTP/1.1 302 Found -Location: https://s3.example.com/presigned_url?expires=... -X-Repo-Commit: abc123def456 -X-Linked-Etag: "sha256:abc123..." -``` - -**Flow**: -``` -┌────────┐ ┌──────────┐ -│ Client │───── GET ─────>│ Kohaku │ -└────────┘ │ Hub │ - ▲ └─────┬────┘ - │ │ - │ 302 Redirect │ Generate - │ (presigned URL) │ presigned - │<─────────────────────────┘ URL - │ - │ ┌──────────┐ - └───>│ S3 │ - │ Direct │ - │ Download │ - └──────────┘ -``` - -**Key Point**: Client downloads directly from S3. Kohaku Hub only provides the redirect URL. - -## Repository Privacy & Filtering - -KohakuHub respects repository privacy settings when listing repositories. The visibility of repositories depends on authentication: - -### Privacy Rules - -**For Unauthenticated Users:** -- Can only see **public** repositories - -**For Authenticated Users:** -- Can see all **public** repositories -- Can see their **own private** repositories -- Can see **private repositories** in organizations they belong to - -### List Repositories Endpoint - -**Pattern**: `/api/{type}s` where type is `model`, `dataset`, or `space` - -**Query Parameters:** -- `author`: Filter by author/namespace (username or organization) -- `limit`: Maximum results (default: 50, max: 1000) - -**Examples:** -```bash -# List all public models -GET /api/models - -# List models by author (respects privacy) -GET /api/models?author=my-org - -# Authenticated user sees their private repos too -GET /api/models?author=my-org -Authorization: Bearer YOUR_TOKEN -``` - -### List User's All Repositories - -**Endpoint**: `GET /api/users/{username}/repos` - -Returns all repositories for a user/organization, grouped by type. - -**Response:** -```json -{ - "models": [ - {"id": "user/model-1", "private": false, ...}, - {"id": "user/model-2", "private": true, ...} - ], - "datasets": [ - {"id": "user/dataset-1", "private": false, ...} - ], - "spaces": [] -} -``` - -**Note**: Private repositories are only included if: -1. The requesting user is the owner, OR -2. The requesting user is a member of the organization - -## Repository Management - -### Create Repository - -**Endpoint**: `POST /api/repos/create` - -**Request**: -```json -{ - "type": "model", - "name": "my-model", - "organization": "my-org", - "private": false -} -``` - -**What Happens**: -``` -1. Check if exists - └─ Query DB for repo - -2. Create LakeFS repo - └─ Repository: hf-model-my-org-my-model - └─ Storage: s3://bucket/hf-model-my-org-my-model - └─ Default branch: main - -3. Record in DB - └─ INSERT INTO repository (...) -``` - -**Response**: -```json -{ - "url": "https://hub.example.com/models/my-org/my-model", - "repo_id": "my-org/my-model" -} -``` - -### List Repository Files - -**Endpoint**: `GET /api/{repo_type}s/{repo_id}/tree/{revision}/{path}` - -**Query Parameters**: -- `recursive`: List all files recursively (default: false) -- `expand`: Include LFS metadata (default: false) - -**Response**: -```json -[ - { - "type": "file", - "oid": "abc123", - "size": 1024, - "path": "config.json" - }, - { - "type": "file", - "oid": "def456", - "size": 52428800, - "path": "model.bin", - "lfs": { - "oid": "def456", - "size": 52428800, - "pointerSize": 134 - } - }, - { - "type": "directory", - "oid": "", - "size": 0, - "path": "configs" - } -] -``` - -### Delete Repository - -**Endpoint**: `DELETE /api/repos/delete` - -**Request**: -```json -{ - "type": "model", - "name": "my-model", - "organization": "my-org" -} -``` - -**What Happens**: -``` -1. Delete from LakeFS - └─ Remove repository metadata - └─ (Objects remain in S3 for safety) - -2. Delete from DB - ├─ DELETE FROM file WHERE repo_full_id = ... - ├─ DELETE FROM staging_upload WHERE repo_full_id = ... - └─ DELETE FROM repository WHERE full_id = ... - -3. Return success -``` - ## Database Schema ```mermaid @@ -608,16 +213,22 @@ erDiagram USER ||--o{ SESSION : has USER ||--o{ TOKEN : has USER ||--o{ SSHKEY : has - USER }o--o{ ORGANIZATION : member_of - ORGANIZATION ||--o{ REPOSITORY : owns + USER }o--o{ USER : member_of + USER ||--o{ REPOSITORY_LIKE : likes + USER ||--o{ DOWNLOAD_SESSION : downloads REPOSITORY ||--o{ FILE : contains REPOSITORY ||--o{ COMMIT : has - REPOSITORY ||--o{ STAGINGUPLOAD : has - COMMIT ||--o{ LFSOBJECTHISTORY : references + REPOSITORY ||--o{ STAGING_UPLOAD : has + REPOSITORY ||--o{ REPOSITORY_LIKE : liked_by + REPOSITORY ||--o{ DOWNLOAD_SESSION : tracked + REPOSITORY ||--o{ DAILY_REPO_STATS : has_stats + COMMIT ||--o{ LFS_OBJECT_HISTORY : references USER { int id PK string username UK + string normalized_name UK + boolean is_org string email UK string password_hash boolean email_verified @@ -626,6 +237,10 @@ erDiagram bigint public_quota_bytes bigint private_used_bytes bigint public_used_bytes + string full_name + text bio + blob avatar + datetime avatar_updated_at datetime created_at } @@ -637,16 +252,25 @@ erDiagram string full_id boolean private int owner_id FK + bigint quota_bytes + bigint used_bytes + int lfs_threshold_bytes + int lfs_keep_versions + text lfs_suffix_rules + int downloads + int likes_count datetime created_at } FILE { int id PK - string repo_full_id + int repository_id FK string path_in_repo int size string sha256 boolean lfs + boolean is_deleted + int owner_id FK datetime created_at datetime updated_at } @@ -654,27 +278,17 @@ erDiagram COMMIT { int id PK string commit_id - string repo_full_id + int repository_id FK string repo_type string branch - int user_id FK + int author_id FK + int owner_id FK string username text message text description datetime created_at } - ORGANIZATION { - int id PK - string name UK - text description - bigint private_quota_bytes - bigint public_quota_bytes - bigint private_used_bytes - bigint public_used_bytes - datetime created_at - } - TOKEN { int id PK int user_id FK @@ -704,9 +318,9 @@ erDiagram datetime created_at } - STAGINGUPLOAD { + STAGING_UPLOAD { int id PK - string repo_full_id + int repository_id FK string repo_type string revision string path_in_repo @@ -715,122 +329,50 @@ erDiagram string upload_id string storage_key boolean lfs + int uploader_id FK datetime created_at } - LFSOBJECTHISTORY { + LFS_OBJECT_HISTORY { int id PK - string repo_full_id + int repository_id FK string path_in_repo string sha256 int size string commit_id + int file_id FK datetime created_at } -``` -### Key Tables + REPOSITORY_LIKE { + int id PK + int repository_id FK + int user_id FK + datetime created_at + } -**Repository Table** - Stores repository metadata: -- Unique constraint on `(repo_type, namespace, name)` -- Allows same `full_id` across different `repo_type` -- Example: `model:myorg/mymodel`, `dataset:myorg/mymodel` + DOWNLOAD_SESSION { + int id PK + int repository_id FK + int user_id FK + string session_id + int time_bucket + int file_count + string first_file + datetime first_download_at + datetime last_download_at + } -**File Table** - Deduplication and metadata: -- Unique constraint on `(repo_full_id, path_in_repo)` -- `sha256` indexed for fast deduplication lookups -- `lfs` flag indicates if file uses LFS storage - -**Commit Table** - User commit tracking: -- `commit_id` is LakeFS commit SHA -- Indexed by `(repo_full_id, branch)` for fast queries -- Denormalized `username` for performance - -**LFSObjectHistory Table** - LFS garbage collection: -- Tracks which commits reference which LFS objects -- Enables preserving K versions of each file (default: 5) -- Used for auto-cleanup of old LFS objects - -**StagingUpload Table** - Multipart upload tracking: -- Tracks ongoing multipart uploads -- Enables upload resume -- Cleans up failed uploads - -## LakeFS Integration - -### Repository Naming Convention - -``` -Pattern: {namespace}-{repo_type}-{org}-{name} - -Examples: - HuggingFace repo: "myorg/mymodel" - LakeFS repo: "hf-model-myorg-mymodel" - - HuggingFace repo: "johndoe/dataset" - LakeFS repo: "hf-dataset-johndoe-dataset" -``` - -### Implementation Notes - -**Database Operations:** -- **Synchronous:** Uses Peewee ORM with synchronous operations -- **Transactions:** `db.atomic()` ensures ACID compliance across concurrent workers -- **Multi-Worker Safe:** Designed for horizontal scaling (4-8 workers recommended) -- **Future:** Migration to peewee-async planned for improved concurrency - -**LakeFS Operations:** -- **Pure Async:** All operations use REST API via httpx (no thread pools!) -- **No Deprecated Library:** Uses direct REST API instead of lakefs-client - -### Key Operations - -**All LakeFS operations use pure async REST API via httpx (no thread pools!):** - -| Operation | LakeFS REST Endpoint | KohakuHub Method | Purpose | -|-----------|---------------------|------------------|---------| -| Create Repo | `POST /repositories` | `create_repository()` | Initialize new repository | -| Upload Small File | `POST /repositories/{repo}/branches/{branch}/objects` | `upload_object()` | Direct content upload | -| Link LFS File | `PUT /repositories/{repo}/branches/{branch}/staging/backing` | `link_physical_address()` | Link S3 object to LakeFS | -| Commit | `POST /repositories/{repo}/branches/{branch}/commits` | `commit()` | Create atomic commit | -| List Files | `GET /repositories/{repo}/refs/{ref}/objects/ls` | `list_objects()` | Browse repository | -| Get File Info | `GET /repositories/{repo}/refs/{ref}/objects/stat` | `stat_object()` | Get file metadata | -| Get File Content | `GET /repositories/{repo}/refs/{ref}/objects` | `get_object()` | Download file | -| Delete File | `DELETE /repositories/{repo}/branches/{branch}/objects` | `delete_object()` | Remove file | -| Create Branch | `POST /repositories/{repo}/branches` | `create_branch()` | Create new branch | -| Delete Branch | `DELETE /repositories/{repo}/branches/{branch}` | `delete_branch()` | Delete branch | -| Create Tag | `POST /repositories/{repo}/tags` | `create_tag()` | Create tag | -| Delete Tag | `DELETE /repositories/{repo}/tags/{tag}` | `delete_tag()` | Delete tag | -| Revert | `POST /repositories/{repo}/branches/{branch}/revert` | `revert_branch()` | Revert commit | -| Merge | `POST /repositories/{repo}/refs/{source}/merge/{dest}` | `merge_into_branch()` | Merge branches | -| Hard Reset | `PUT /repositories/{repo}/branches/{branch}/hard_reset` | `hard_reset_branch()` | Reset branch to commit | - -### Physical Address Linking - -``` -When uploading LFS file: - -1. Client uploads to S3: - s3://bucket/lfs/ab/cd/abcd1234... - -2. Kohaku Hub links to LakeFS: - ┌──────────────────────────────────┐ - │ StagingMetadata │ - ├──────────────────────────────────┤ - │ physical_address: │ - │ "s3://bucket/lfs/ab/cd/abc..." │ - │ checksum: "sha256:abc..." │ - │ size_bytes: 52428800 │ - └──────────────────────────────────┘ - │ - ▼ - ┌──────────────────────────────────┐ - │ LakeFS: model.bin │ - │ → Points to S3 object │ - └──────────────────────────────────┘ - -3. On commit: - LakeFS records this link in its metadata + DAILY_REPO_STATS { + int id PK + int repository_id FK + date date + int download_sessions + int authenticated_downloads + int anonymous_downloads + int total_files + datetime created_at + } ``` ## API Endpoint Summary @@ -886,9 +428,77 @@ When uploading LFS file: | Endpoint | Method | Auth | Description | |----------|--------|------|-------------| -| `/users/{username}/settings` | PUT | ✓ | Update user settings | -| `/organizations/{org_name}/settings` | PUT | ✓ | Update organization settings | -| `/{type}s/{namespace}/{name}/settings` | PUT | ✓ | Update repository settings (private, gated) | +| `/api/users/{username}/settings` | PUT | ✓ | Update user settings | +| `/api/organizations/{org_name}/settings` | PUT | ✓ | Update organization settings | +| `/{type}s/{namespace}/{name}/settings` | PUT | ✓ | Update repository settings (private, gated, LFS settings) | +| `/api/{type}s/{namespace}/{name}/lfs/settings` | GET | ○ | Get repository LFS settings | + +### Social Features + +**Likes:** +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/{type}s/{namespace}/{name}/like` | POST | ✓ | Like a repository | +| `/api/{type}s/{namespace}/{name}/like` | DELETE | ✓ | Unlike a repository | +| `/api/{type}s/{namespace}/{name}/like` | GET | ○ | Check if current user liked repository | +| `/api/{type}s/{namespace}/{name}/likers` | GET | ○ | List users who liked repository | +| `/api/users/{username}/likes` | GET | ○ | List repositories user has liked | + +**Statistics & Trending:** +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/{type}s/{namespace}/{name}/stats` | GET | ○ | Get repository statistics (downloads, likes) | +| `/api/{type}s/{namespace}/{name}/stats/recent` | GET | ○ | Get recent download statistics (time series) | +| `/api/trending` | GET | ○ | Get trending repositories | + +**Avatars:** +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/users/{username}/avatar` | POST | ✓ | Upload user avatar | +| `/api/users/{username}/avatar` | GET | ○ | Get user avatar image | +| `/api/users/{username}/avatar` | DELETE | ✓ | Delete user avatar | +| `/api/organizations/{org_name}/avatar` | POST | ✓ | Upload organization avatar | +| `/api/organizations/{org_name}/avatar` | GET | ○ | Get organization avatar image | +| `/api/organizations/{org_name}/avatar` | DELETE | ✓ | Delete organization avatar | + +### Quota Management + +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/quota/{namespace}` | GET | ✓ | Get namespace quota information | +| `/api/quota/{namespace}` | PUT | ✓ | Set namespace quota | +| `/api/quota/{namespace}/recalculate` | POST | ✓ | Recalculate namespace storage usage | +| `/api/quota/{namespace}/public` | GET | ○ | Get public quota info (permission-based) | +| `/api/quota/{namespace}/repos` | GET | ✓ | List namespace repositories with storage breakdown | +| `/api/quota/repo/{type}/{namespace}/{name}` | GET | ○ | Get repository quota information | +| `/api/quota/repo/{type}/{namespace}/{name}` | PUT | ✓ | Set repository quota | +| `/api/quota/repo/{type}/{namespace}/{name}/recalculate` | POST | ✓ | Recalculate repository storage | + +### Invitations + +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/invitations/org/{org_name}/create` | POST | ✓ | Create organization invitation | +| `/api/invitations/{token}` | GET | ✗ | Get invitation details | +| `/api/invitations/{token}/accept` | POST | ✓ | Accept invitation | +| `/api/invitations/{token}` | DELETE | ✓ | Delete/cancel invitation | +| `/api/invitations/org/{org_name}/list` | GET | ✓ | List organization invitations | + +### SSH Keys + +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/user/keys` | GET | ✓ | List user's SSH keys | +| `/api/user/keys` | POST | ✓ | Add new SSH key | +| `/api/user/keys/{key_id}` | GET | ✓ | Get SSH key details | +| `/api/user/keys/{key_id}` | DELETE | ✓ | Delete SSH key | + +### Validation + +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/api/validate/check-name` | POST | ✗ | Check if username/org/repo name is available | +| `/api/validate-yaml` | POST | ✗ | Validate YAML content | ### Authentication Operations @@ -915,11 +525,19 @@ When uploading LFS file: | `/org/{org_name}/members/{username}` | PUT | ✓ | Update member role | | `/org/users/{username}/orgs` | GET | ✗ | List user's organizations | +### Git Operations + +| Endpoint | Method | Auth | Description | +|----------|--------|------|-------------| +| `/{namespace}/{name}.git/info/refs` | GET | ○ | Git service advertisement | +| `/{namespace}/{name}.git/HEAD` | GET | ○ | Get HEAD reference | +| `/{namespace}/{name}.git/git-upload-pack` | POST | ○ | Clone/fetch/pull | +| `/{namespace}/{name}.git/git-receive-pack` | POST | ✓ | Push (in development) | + ### Utility Operations | Endpoint | Method | Auth | Description | |----------|--------|------|-------------| -| `/api/validate-yaml` | POST | ✗ | Validate YAML content | | `/api/whoami-v2` | GET | ✓ | Get detailed current user info | | `/api/version` | GET | ✗ | Get API version information | | `/health` | GET | ✗ | Health check | @@ -932,345 +550,386 @@ When uploading LFS file: --- -## Detailed Endpoint Documentation +## New Features Documentation -### Commit History API +### Repository Likes -The commit history API allows you to retrieve the commit log for a specific branch in a repository. - -**Endpoint**: `GET /{repo_type}s/{namespace}/{name}/commits/{branch}` - -**Query Parameters**: -- `page`: Page number for pagination (default: 1) -- `limit`: Number of commits per page (default: 20) - -**Example Request**: +**Like a repository:** ```bash -GET /models/myorg/mymodel/commits/main?page=1&limit=20 +POST /api/models/org/model/like +Authorization: Bearer YOUR_TOKEN ``` -**Response**: +**Response:** ```json { - "commits": [ + "success": true, + "message": "Repository liked successfully", + "likes_count": 42 +} +``` + +**Check if liked:** +```bash +GET /api/models/org/model/like +``` + +**Response:** +```json +{ + "liked": true +} +``` + +**List likers:** +```bash +GET /api/models/org/model/likers?limit=50 +``` + +**Response:** +```json +{ + "likers": [ { - "id": "abc123def456", - "message": "Update model config", - "author": "john@example.com", - "committer": "john@example.com", - "createdAt": "2025-10-05T12:00:00Z", - "parents": ["parent123"] + "username": "alice", + "full_name": "Alice Developer" } ], - "pagination": { - "page": 1, - "limit": 20, - "total": 150, - "hasMore": true - } + "total": 42 } ``` -### Branch and Tag Management +### Statistics and Trending -#### Create Branch +**Get repository stats:** +```bash +GET /api/models/org/model/stats +``` -**Endpoint**: `POST /{repo_type}s/{namespace}/{name}/branch` - -**Request**: +**Response:** ```json { - "branch": "feature-branch", - "startPoint": "main" + "downloads": 1234, + "likes": 42 } ``` -**Response**: -```json -{ - "success": true, - "branch": "feature-branch", - "ref": "refs/heads/feature-branch" -} +**Get recent statistics (time series):** +```bash +GET /api/models/org/model/stats/recent?days=30 ``` -#### Delete Branch - -**Endpoint**: `DELETE /{repo_type}s/{namespace}/{name}/branch/{branch}` - -**Example**: `DELETE /models/myorg/mymodel/branch/feature-branch` - -**Response**: +**Response:** ```json { - "success": true, - "deleted": "feature-branch" -} -``` - -**Note**: Cannot delete the default branch (usually `main`). - -#### Create Tag - -**Endpoint**: `POST /{repo_type}s/{namespace}/{name}/tag` - -**Request**: -```json -{ - "tag": "v1.0.0", - "ref": "main", - "message": "Release version 1.0.0" -} -``` - -**Response**: -```json -{ - "success": true, - "tag": "v1.0.0", - "ref": "refs/tags/v1.0.0" -} -``` - -#### Delete Tag - -**Endpoint**: `DELETE /{repo_type}s/{namespace}/{name}/tag/{tag}` - -**Example**: `DELETE /models/myorg/mymodel/tag/v1.0.0` - -**Response**: -```json -{ - "success": true, - "deleted": "v1.0.0" -} -``` - -### Settings Management - -#### Update User Settings - -**Endpoint**: `PUT /users/{username}/settings` - -**Request**: -```json -{ - "email": "newemail@example.com", - "displayName": "John Doe", - "bio": "ML Engineer", - "website": "https://example.com" -} -``` - -**Response**: -```json -{ - "success": true, - "user": { - "username": "johndoe", - "email": "newemail@example.com", - "displayName": "John Doe" - } -} -``` - -#### Update Organization Settings - -**Endpoint**: `PUT /organizations/{org_name}/settings` - -**Request**: -```json -{ - "displayName": "My Organization", - "description": "Building amazing ML models", - "website": "https://example.com", - "avatar": "https://cdn.example.com/avatar.png" -} -``` - -**Response**: -```json -{ - "success": true, - "organization": { - "name": "my-org", - "displayName": "My Organization", - "description": "Building amazing ML models" - } -} -``` - -#### Update Repository Settings - -**Endpoint**: `PUT /{repo_type}s/{namespace}/{name}/settings` - -**Request**: -```json -{ - "private": true, - "gated": false, - "description": "A state-of-the-art language model", - "tags": ["nlp", "transformers", "llm"] -} -``` - -**Response**: -```json -{ - "success": true, - "repository": { - "id": "myorg/mymodel", - "private": true, - "gated": false, - "description": "A state-of-the-art language model" - } -} -``` - -**Privacy Options**: -- `private: false` - Public repository, visible to everyone -- `private: true` - Private repository, only visible to owner and organization members -- `gated: true` - Requires explicit permission to access (for controlled releases) - -#### Move/Rename Repository - -**Endpoint**: `POST /api/repos/move` - -**Request**: -```json -{ - "fromRepo": { - "type": "model", - "namespace": "oldorg", - "name": "oldname" - }, - "toRepo": { - "type": "model", - "namespace": "neworg", - "name": "newname" - } -} -``` - -**Response**: -```json -{ - "success": true, - "url": "https://hub.example.com/models/neworg/newname", - "message": "Repository moved successfully" -} -``` - -**What Happens**: -1. Validates that source repository exists and user has permission -2. Checks that destination doesn't already exist -3. Updates LakeFS repository name -4. Updates all database records -5. Creates redirect from old URL to new URL - -**Note**: This operation is atomic - either everything succeeds or everything rolls back. - -### Version and Utility Endpoints - -#### Get API Version - -**Endpoint**: `GET /api/version` - -**Response**: -```json -{ - "version": "1.0.0", - "apiVersion": "v1", - "lfsVersion": "2.0", - "features": { - "lfs": true, - "multipart": true, - "deduplication": true, - "organizations": true - }, - "limits": { - "maxFileSize": 107374182400, - "lfsThreshold": 10485760 - } -} -``` - -#### Validate YAML - -**Endpoint**: `POST /api/validate-yaml` - -**Request**: -```json -{ - "content": "model:\n name: gpt-2\n version: 1.0" -} -``` - -**Response** (if valid): -```json -{ - "valid": true, - "parsed": { - "model": { - "name": "gpt-2", - "version": "1.0" - } - } -} -``` - -**Response** (if invalid): -```json -{ - "valid": false, - "error": "Invalid YAML syntax at line 2: unexpected character", - "line": 2, - "column": 10 -} -``` - -**Use Case**: Validate README.md frontmatter, model card YAML, or configuration files before upload. - -#### Get Detailed User Info (whoami-v2) - -**Endpoint**: `GET /api/whoami-v2` - -**Response**: -```json -{ - "type": "user", - "id": "12345", - "name": "johndoe", - "fullname": "John Doe", - "email": "john@example.com", - "emailVerified": true, - "canPay": true, - "isPro": false, - "periodEnd": null, - "avatarUrl": "https://cdn.example.com/avatars/johndoe.png", - "orgs": [ + "stats": [ { - "name": "my-org", - "fullname": "My Organization", - "email": "contact@my-org.com", - "avatarUrl": "https://cdn.example.com/orgs/my-org.png", - "roleInOrg": "admin" + "date": "2025-01-15", + "downloads": 45, + "authenticated": 30, + "anonymous": 15, + "files": 120 } ], - "auth": { - "accessToken": { - "displayName": "API Token", - "role": "write" - } + "period": { + "start": "2024-12-16", + "end": "2025-01-15", + "days": 30 } } ``` -**Compared to `/api/auth/me`**: This endpoint provides more detailed information including: -- Organization memberships with roles -- Token information -- Subscription/payment status -- Email verification status +**Get trending repositories:** +```bash +GET /api/trending?repo_type=model&days=7&limit=20 +``` + +**Response:** +```json +{ + "trending": [ + { + "id": "org/hot-model", + "type": "model", + "downloads": 5000, + "likes": 200, + "recent_downloads": 1500, + "private": false + } + ], + "period": { + "start": "2025-01-08", + "end": "2025-01-15", + "days": 7 + } +} +``` + +### Avatar Management + +**Upload avatar:** +```bash +POST /api/users/alice/avatar +Authorization: Bearer YOUR_TOKEN +Content-Type: multipart/form-data + +file: [image binary data] +``` + +**Features:** +- Accepts JPEG, PNG, WebP, GIF +- Maximum input size: 10MB +- Automatically resizes to fit 1024x1024 +- Center crops to square +- Converts to JPEG format +- Output quality: 95% + +**Response:** +```json +{ + "success": true, + "message": "Avatar uploaded successfully", + "size_bytes": 245678 +} +``` + +**Get avatar:** +```bash +GET /api/users/alice/avatar +``` + +Returns JPEG image with cache headers. + +### Quota Management + +**Get quota information:** +```bash +GET /api/quota/alice +Authorization: Bearer YOUR_TOKEN +``` + +**Response:** +```json +{ + "namespace": "alice", + "is_organization": false, + "quota_bytes": 10737418240, + "used_bytes": 1234567890, + "available_bytes": 9502850350, + "percentage_used": 11.5 +} +``` + +**Set quota:** +```bash +PUT /api/quota/alice +Authorization: Bearer YOUR_TOKEN +Content-Type: application/json + +{ + "quota_bytes": 10737418240 +} +``` + +**Repository-specific quota:** +```bash +GET /api/quota/repo/model/org/my-model +``` + +**Response:** +```json +{ + "repo_id": "org/my-model", + "repo_type": "model", + "namespace": "org", + "quota_bytes": 1073741824, + "used_bytes": 524288000, + "available_bytes": 549453824, + "percentage_used": 48.8, + "effective_quota_bytes": 1073741824, + "namespace_quota_bytes": 10737418240, + "namespace_used_bytes": 5368709120, + "namespace_available_bytes": 5368709120, + "is_inheriting": false +} +``` + +**Storage breakdown for namespace:** +```bash +GET /api/quota/org/repos +Authorization: Bearer YOUR_TOKEN +``` + +**Response:** +```json +{ + "namespace": "org", + "is_organization": true, + "total_repos": 15, + "repositories": [ + { + "repo_id": "org/large-model", + "repo_type": "model", + "name": "large-model", + "private": false, + "quota_bytes": null, + "used_bytes": 5368709120, + "percentage_used": 50.0, + "is_inheriting": true, + "created_at": "2025-01-01T00:00:00Z" + } + ] +} +``` + +### Invitations + +**Create organization invitation:** +```bash +POST /api/invitations/org/my-org/create +Authorization: Bearer YOUR_TOKEN +Content-Type: application/json + +{ + "email": "newuser@example.com", + "role": "member", + "max_usage": null, + "expires_days": 7 +} +``` + +**Response:** +```json +{ + "success": true, + "token": "abc123...", + "invitation_link": "http://hub.example.com/invite/abc123...", + "expires_at": "2025-01-22T12:00:00Z", + "max_usage": null, + "is_reusable": false +} +``` + +**Reusable invitation (10 uses):** +```json +{ + "role": "member", + "max_usage": 10, + "expires_days": 30 +} +``` + +**Accept invitation:** +```bash +POST /api/invitations/{token}/accept +Authorization: Bearer YOUR_TOKEN +``` + +### SSH Keys + +**Add SSH key:** +```bash +POST /api/user/keys +Authorization: Bearer YOUR_TOKEN +Content-Type: application/json + +{ + "title": "My Laptop", + "key": "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIB... user@host" +} +``` + +**Response:** +```json +{ + "id": 42, + "title": "My Laptop", + "key_type": "ssh-ed25519", + "fingerprint": "SHA256:abc123...", + "created_at": "2025-01-15T12:00:00.000000Z", + "last_used": null +} +``` + +**Supported key types:** +- `ssh-rsa` +- `ssh-dss` +- `ecdsa-sha2-nistp256` +- `ecdsa-sha2-nistp384` +- `ecdsa-sha2-nistp521` +- `ssh-ed25519` + +### Name Validation + +**Check if name is available:** +```bash +POST /api/validate/check-name +Content-Type: application/json + +{ + "name": "my-new-repo", + "namespace": "org", + "type": "model" +} +``` + +**Response (available):** +```json +{ + "available": true, + "normalized_name": "my_new_repo", + "conflict_with": null, + "message": "Repository name is available" +} +``` + +**Response (conflict):** +```json +{ + "available": false, + "normalized_name": "my_new_repo", + "conflict_with": "org/My-New-Repo", + "message": "Repository name conflicts with existing repository: My-New-Repo (case-insensitive)" +} +``` + +### LFS Settings + +**Get repository LFS settings:** +```bash +GET /api/models/org/model/lfs/settings +``` + +**Response:** +```json +{ + "lfs_threshold_bytes": 5000000, + "lfs_threshold_bytes_effective": 5000000, + "lfs_threshold_bytes_source": "repository", + "lfs_keep_versions": 10, + "lfs_keep_versions_effective": 10, + "lfs_keep_versions_source": "repository", + "lfs_suffix_rules": [".safetensors", ".bin"], + "lfs_suffix_rules_effective": [".safetensors", ".bin"], + "server_defaults": { + "lfs_threshold_bytes": 10000000, + "lfs_keep_versions": 5 + } +} +``` + +**Update repository settings with LFS:** +```bash +PUT /models/org/model/settings +Authorization: Bearer YOUR_TOKEN +Content-Type: application/json + +{ + "lfs_threshold_bytes": 5000000, + "lfs_keep_versions": 10, + "lfs_suffix_rules": [".safetensors", ".bin", ".gguf"] +} +``` ## Content Deduplication @@ -1298,12 +957,6 @@ Benefits: - Efficient for model variants ``` -**Deduplication Points**: - -1. **Preupload Check**: Query DB by SHA256 -2. **LFS Batch API**: Check if OID exists -3. **Commit**: Link existing S3 object instead of uploading - ## Error Handling Kohaku Hub uses HuggingFace-compatible error headers: @@ -1330,31 +983,19 @@ These error codes are parsed by `huggingface_hub` client to raise appropriate Py ## Performance Considerations -### Upload Performance +### Download Tracking -``` -Small Files (≤10MB): - Client → FastAPI → LakeFS → S3 - (Proxied through server) +KohakuHub implements smart download tracking: -Large Files (>10MB): - Client ─────────────────────→ S3 - (Direct upload, no proxy) - ↓ - Kohaku Hub (only metadata link) -``` +**Session Deduplication:** +- Downloads are grouped into 15-minute sessions +- Multiple files downloaded in the same session count as 1 download +- Uses session ID + time bucket for deduplication -**Why this matters**: Large files bypass the application server entirely, allowing unlimited throughput limited only by client and S3 bandwidth. - -### Download Performance - -``` -All Downloads: - Client → Kohaku Hub → 302 Redirect → S3 - (metadata) (direct) -``` - -**Why this matters**: After initial redirect, all data transfer is direct from S3/CDN. Server only generates presigned URLs. +**Benefits:** +- Accurate download counts (git clone = 1 download, not N file downloads) +- Trending calculations based on unique sessions +- Efficient storage (one record per session) ### Recommended S3 Providers diff --git a/docs/Admin.md b/docs/Admin.md index 7f571da..21c27bc 100644 --- a/docs/Admin.md +++ b/docs/Admin.md @@ -7,27 +7,6 @@ --- -## Admin Portal Architecture - -```mermaid -graph LR - subgraph "Admin Access" - Browser[Browser] -->|X-Admin-Token| Portal[Admin Portal UI] - end - - subgraph "Admin API" - Portal -->|REST API| AdminAPI[Admin Endpoints] - end - - subgraph "Data Sources" - AdminAPI -->|Queries| DB[PostgreSQL/SQLite] - AdminAPI -->|List Objects| S3[MinIO/S3] - AdminAPI -->|Repository Info| LakeFS[LakeFS] - end -``` - ---- - ## Table of Contents 1. [Overview](#overview) @@ -38,8 +17,9 @@ graph LR 6. [Commit History Viewer](#commit-history-viewer) 7. [S3 Storage Browser](#s3-storage-browser) 8. [Quota Management](#quota-management) -9. [API Reference](#api-reference) -10. [Security Best Practices](#security-best-practices) +9. [Invitation Management](#invitation-management) +10. [API Reference](#api-reference) +11. [Security Best Practices](#security-best-practices) --- @@ -52,7 +32,9 @@ The Admin Portal provides a centralized interface for managing your KohakuHub in - **Commit History** - Track commits across all repositories - **Storage Browser** - Browse S3 buckets and objects - **Quota Management** - Set and monitor storage quotas +- **Invitation Management** - Create and manage registration invitations - **Statistics Dashboard** - Real-time insights into usage +- **Bulk Operations** - Recalculate storage for all repositories **Access URL:** ``` @@ -116,6 +98,9 @@ The dashboard shows real-time statistics from your database: - Email verified users - Inactive users +**Organization Stats:** +- Total organizations + **Repository Stats:** - Total repositories - Private vs public repositories @@ -126,7 +111,7 @@ The dashboard shows real-time statistics from your database: - Top contributors (by commit count) **Storage Stats:** -- Total storage used +- Total storage used (private + public) - Private vs public storage - LFS object count and size @@ -136,6 +121,7 @@ The dashboard shows real-time statistics from your database: - View commits - Inspect S3 storage - Manage quotas +- Manage invitations --- @@ -165,6 +151,7 @@ The dashboard shows real-time statistics from your database: - Email (required, unique) - Password (required) - Email verified (checkbox) +- Is active (checkbox) - Private quota (bytes, optional = unlimited) - Public quota (bytes, optional = unlimited) @@ -174,10 +161,27 @@ Username: alice Email: alice@example.com Password: ******** Email Verified: ✓ +Is Active: ✓ Private Quota: 10737418240 (10 GB) Public Quota: 53687091200 (50 GB) ``` +**API Endpoint:** +```bash +curl -X POST http://localhost:48888/admin/api/users \ + -H "X-Admin-Token: your-secret-token" \ + -H "Content-Type: application/json" \ + -d '{ + "username": "alice", + "email": "alice@example.com", + "password": "secure_password", + "email_verified": true, + "is_active": true, + "private_quota_bytes": 10737418240, + "public_quota_bytes": 53687091200 + }' +``` + ### View User Details Click "View" to see: @@ -209,12 +213,29 @@ Click "View" to see: 3. Choose: Cancel or Force Delete 4. Confirm force delete → All data deleted +**API Endpoint:** +```bash +# Normal delete (fails if user owns repos) +curl -X DELETE http://localhost:48888/admin/api/users/alice \ + -H "X-Admin-Token: your-secret-token" + +# Force delete (deletes user and all their repos) +curl -X DELETE "http://localhost:48888/admin/api/users/alice?force=true" \ + -H "X-Admin-Token: your-secret-token" +``` + ### Toggle Email Verification **Use case:** Manually verify users when email verification is disabled or failed. **Action:** Click "Verify" or "Unverify" button → Instant update +**API Endpoint:** +```bash +curl -X PATCH http://localhost:48888/admin/api/users/alice/email-verification?verified=true \ + -H "X-Admin-Token: your-secret-token" +``` + --- ## Repository Management @@ -231,6 +252,7 @@ Click "View" to see: - Full repository ID (namespace/name) - Privacy status (Private/Public badge) - Owner username +- Storage quota and usage - Created date **Actions:** @@ -244,21 +266,18 @@ Click "View" to see: - Owner username - Privacy status - Created date -- **File count** (from database) +- **File count** (from database, active files only) - **Commit count** (from database) -- **Total size** (sum of all files) +- **Total size** (sum of all active files) +- **Quota information** (quota, used, percentage, inheriting status) **Actions:** - View in Main App → Opens repository in main UI -### API Endpoints - -``` -GET /admin/api/repositories - Query: repo_type, namespace, limit, offset - -GET /admin/api/repositories/{type}/{namespace}/{name} - Returns: Detailed repo info with stats +**API Endpoint:** +```bash +curl http://localhost:48888/admin/api/repositories/model/org/my-model \ + -H "X-Admin-Token: your-secret-token" ``` --- @@ -288,6 +307,21 @@ View all commits across all repositories in your instance. - Page size: 10, 20, 50, 100 - Navigate through pages +**API Endpoint:** +```bash +# List all commits +curl http://localhost:48888/admin/api/commits?limit=100 \ + -H "X-Admin-Token: your-secret-token" + +# Filter by repository +curl "http://localhost:48888/admin/api/commits?repo_full_id=org/model&limit=50" \ + -H "X-Admin-Token: your-secret-token" + +# Filter by author +curl "http://localhost:48888/admin/api/commits?username=alice&limit=50" \ + -H "X-Admin-Token: your-secret-token" +``` + ### Use Cases - Track user activity @@ -296,13 +330,6 @@ View all commits across all repositories in your instance. - Debug commit issues - Audit trail -### API Endpoint - -``` -GET /admin/api/commits - Query: repo_full_id, username, limit, offset -``` - --- ## S3 Storage Browser @@ -317,7 +344,7 @@ GET /admin/api/commits **Metrics:** - Bucket name -- Total size (formatted: KB, MB, GB) +- Total size (formatted: KB, MB, GB, TB) - Object count - Creation date - Progress bar (relative to 100GB) @@ -325,6 +352,26 @@ GET /admin/api/commits **Actions:** - Click bucket → Browse contents +**API Endpoint:** +```bash +curl http://localhost:48888/admin/api/storage/buckets \ + -H "X-Admin-Token: your-secret-token" +``` + +**Response:** +```json +{ + "buckets": [ + { + "name": "hub-storage", + "creation_date": "2025-01-01T00:00:00Z", + "total_size": 107374182400, + "object_count": 5000 + } + ] +} +``` + ### Object Browser **Features:** @@ -347,15 +394,11 @@ Enter prefix: hf-model-org-repo/ → Shows objects for specific repository ``` -### API Endpoints - -``` -GET /admin/api/storage/buckets - Returns: All buckets with sizes - -GET /admin/api/storage/objects/{bucket} - Query: prefix, limit - Returns: Objects in bucket +**API Endpoint:** +```bash +# List objects in bucket +curl "http://localhost:48888/admin/api/storage/objects/hub-storage?prefix=lfs/&limit=100" \ + -H "X-Admin-Token: your-secret-token" ``` --- @@ -372,6 +415,34 @@ GET /admin/api/storage/objects/{bucket} - Total usage - Usage percentages +**API Endpoint:** +```bash +# Get user quota +curl "http://localhost:48888/admin/api/quota/alice?is_org=false" \ + -H "X-Admin-Token: your-secret-token" + +# Get organization quota +curl "http://localhost:48888/admin/api/quota/my-org?is_org=true" \ + -H "X-Admin-Token: your-secret-token" +``` + +**Response:** +```json +{ + "namespace": "alice", + "is_organization": false, + "private_quota_bytes": 10737418240, + "public_quota_bytes": 53687091200, + "private_used_bytes": 1234567890, + "public_used_bytes": 5678901234, + "private_available_bytes": 9502850350, + "public_available_bytes": 47008189966, + "private_percentage_used": 11.5, + "public_percentage_used": 10.6, + "total_used_bytes": 6913469124 +} +``` + ### Set Quota **Fields:** @@ -385,6 +456,17 @@ GET /admin/api/storage/objects/{bucket} Unlimited = (empty/null) ``` +**API Endpoint:** +```bash +curl -X PUT http://localhost:48888/admin/api/quota/alice \ + -H "X-Admin-Token: your-secret-token" \ + -H "Content-Type: application/json" \ + -d '{ + "private_quota_bytes": 10737418240, + "public_quota_bytes": 53687091200 + }' +``` + ### Recalculate Storage **Purpose:** Re-scan all files and update storage usage. @@ -395,22 +477,156 @@ Unlimited = (empty/null) - Quota shows incorrect values **Process:** -1. Scans all LakeFS objects for namespace -2. Sums file sizes +1. Scans all files for namespace +2. Sums file sizes (private and public separately) 3. Updates User/Organization table -### API Endpoints - +**API Endpoint:** +```bash +curl -X POST "http://localhost:48888/admin/api/quota/alice/recalculate?is_org=false" \ + -H "X-Admin-Token: your-secret-token" ``` -GET /admin/api/quota/{namespace}?is_org=false - Returns: Quota information -PUT /admin/api/quota/{namespace} - Body: {private_quota_bytes, public_quota_bytes} - Returns: Updated quota +### Bulk Storage Recalculation -POST /admin/api/quota/{namespace}/recalculate - Returns: Recalculated usage +**NEW:** Recalculate storage for all repositories at once. + +**API Endpoint:** +```bash +# Recalculate all repositories +curl -X POST http://localhost:48888/admin/api/repositories/recalculate-all \ + -H "X-Admin-Token: your-secret-token" + +# Filter by type +curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?repo_type=model" \ + -H "X-Admin-Token: your-secret-token" + +# Filter by namespace +curl -X POST "http://localhost:48888/admin/api/repositories/recalculate-all?namespace=org" \ + -H "X-Admin-Token: your-secret-token" +``` + +**Response:** +```json +{ + "total": 250, + "success_count": 248, + "failure_count": 2, + "failures": [ + { + "repo_id": "org/problem-repo", + "error": "Repository not found in LakeFS" + } + ], + "message": "Recalculated storage for 248/250 repositories" +} +``` + +--- + +## Invitation Management + +### Create Registration Invitation + +**Purpose:** Generate invitations for user registration (useful for invite-only mode). + +**Features:** +- Optional organization membership after registration +- Reusable invitations with usage limits +- Configurable expiration + +**API Endpoint:** +```bash +curl -X POST http://localhost:48888/admin/api/invitations/register \ + -H "X-Admin-Token: your-secret-token" \ + -H "Content-Type: application/json" \ + -d '{ + "org_id": null, + "role": "member", + "max_usage": 10, + "expires_days": 30 + }' +``` + +**Response:** +```json +{ + "success": true, + "token": "abc123xyz...", + "invitation_link": "http://your-hub.com/register?invitation=abc123xyz...", + "expires_at": "2025-02-14T12:00:00Z", + "max_usage": 10, + "is_reusable": true, + "action": "register_account" +} +``` + +**Invitation Types:** +- **One-time:** `max_usage: null` - Single use invitation +- **Limited:** `max_usage: 10` - Can be used 10 times +- **Unlimited:** `max_usage: -1` - Unlimited uses + +**Auto-join Organization:** +```json +{ + "org_id": 5, + "role": "member", + "max_usage": 50, + "expires_days": 90 +} +``` + +Users who register with this invitation will automatically join the organization as members. + +### List All Invitations + +**API Endpoint:** +```bash +# List all invitations +curl http://localhost:48888/admin/api/invitations \ + -H "X-Admin-Token: your-secret-token" + +# Filter by action type +curl "http://localhost:48888/admin/api/invitations?action=register_account" \ + -H "X-Admin-Token: your-secret-token" +``` + +**Response:** +```json +{ + "invitations": [ + { + "id": 1, + "token": "abc123...", + "action": "register_account", + "org_id": null, + "org_name": null, + "role": null, + "email": null, + "created_by": 1, + "creator_username": "System", + "created_at": "2025-01-15T12:00:00Z", + "expires_at": "2025-02-15T12:00:00Z", + "max_usage": 10, + "usage_count": 5, + "is_reusable": true, + "is_available": true, + "error_message": null, + "used_at": null, + "used_by": null + } + ], + "limit": 100, + "offset": 0 +} +``` + +### Delete Invitation + +**API Endpoint:** +```bash +curl -X DELETE http://localhost:48888/admin/api/invitations/{token} \ + -H "X-Admin-Token: your-secret-token" ``` --- @@ -441,6 +657,7 @@ PATCH /admin/api/users/{username}/email-verification # Set verification ``` GET /admin/api/repositories # List repositories GET /admin/api/repositories/{type}/{namespace}/{name} # Get details +POST /admin/api/repositories/recalculate-all # Bulk storage recalc ``` **Commit History:** @@ -469,6 +686,13 @@ PUT /admin/api/quota/{namespace} # Set quota POST /admin/api/quota/{namespace}/recalculate # Recalculate ``` +**Invitations:** +``` +POST /admin/api/invitations/register # Create registration invitation +GET /admin/api/invitations # List all invitations +DELETE /admin/api/invitations/{token} # Delete invitation +``` + ### Response Formats **User Info:** @@ -501,7 +725,11 @@ POST /admin/api/quota/{namespace}/recalculate # Recalculate "created_at": "2025-01-01T00:00:00.000000Z", "file_count": 15, "commit_count": 8, - "total_size": 12345678 + "total_size": 12345678, + "quota_bytes": null, + "used_bytes": 12345678, + "percentage_used": 0.12, + "is_inheriting": true } ``` @@ -615,7 +843,7 @@ Admin operations are logged with `[ADMIN]` prefix: ``` [WARNING] [ADMIN] [07:05:55] Admin deleted user: testuser (deleted 5 repositories) [INFO] [ADMIN] [07:06:12] Admin set quota for user alice: private=10737418240, public=53687091200 -[WARNING] [ADMIN] [07:06:45] Admin deleted repository: model:org/test-model +[WARNING] [ADMIN] [07:06:45] Admin created registration invitation (max_usage=10, expires=30d) ``` **Monitor logs:** @@ -642,7 +870,21 @@ docker logs khub-hub-api | grep "\[ADMIN\]" 5. Share credentials with user ``` -### Scenario 2: Storage Cleanup +### Scenario 2: Invite-Only Registration Mode + +``` +1. Dashboard → "Manage Invitations" +2. Click "Create Registration Invitation" +3. Configure: + - Max Usage: 50 (for team) + - Expires: 90 days + - Auto-join Organization: my-company (as member) +4. Copy invitation link +5. Share link with team members +6. Monitor usage count +``` + +### Scenario 3: Storage Cleanup ``` 1. Dashboard → "Browse Storage" @@ -653,7 +895,7 @@ docker logs khub-hub-api | grep "\[ADMIN\]" 6. (Manually delete via CLI/API if needed) ``` -### Scenario 3: User Investigation +### Scenario 4: User Investigation ``` 1. Dashboard → "View Commits" @@ -663,7 +905,7 @@ docker logs khub-hub-api | grep "\[ADMIN\]" 5. If needed: Go to Users → Delete user (with force) ``` -### Scenario 4: Quota Enforcement +### Scenario 5: Quota Enforcement ``` 1. Dashboard → "Manage Quotas" @@ -674,6 +916,17 @@ docker logs khub-hub-api | grep "\[ADMIN\]" 6. Monitor dashboard for compliance ``` +### Scenario 6: System Maintenance + +``` +1. Dashboard → "Bulk Operations" +2. Click "Recalculate All Repository Storage" +3. Optional: Filter by type or namespace +4. Confirm operation +5. Wait for completion (progress logged) +6. Review success/failure report +``` + --- ## Troubleshooting @@ -700,14 +953,14 @@ docker logs khub-hub-api | grep "\[ADMIN\]" ### Storage Size Incorrect **Problem:** Database out of sync with S3 -**Solution:** Use "Recalculate" button in Quota Management +**Solution:** Use "Recalculate" button in Quota Management or bulk recalculation endpoint --- ### Can't Delete User **Problem:** User owns repositories -**Solution:** Either delete repos first, or use "Force Delete" option +**Solution:** Either delete repos first, or use "Force Delete" option with `force=true` parameter --- @@ -755,6 +1008,21 @@ curl -H "X-Admin-Token: your-token" \ "http://localhost:48888/admin/api/stats/top-repos?by=size&limit=10" ``` +**Response:** +```json +{ + "top_repositories": [ + { + "repo_full_id": "org/active-model", + "repo_type": "model", + "commit_count": 150, + "private": false + } + ], + "sorted_by": "commits" +} +``` + --- ## Integration with CI/CD @@ -785,6 +1053,31 @@ user = response.json() print(f"Created user: {user['username']} (ID: {user['id']})") ``` +### Bulk Invitation Generation + +```python +import requests + +admin_token = "your-admin-token" +base_url = "http://hub.example.com" + +# Create reusable invitation for 100 users +response = requests.post( + f"{base_url}/admin/api/invitations/register", + headers={"X-Admin-Token": admin_token}, + json={ + "org_id": 5, # Auto-join org after registration + "role": "member", + "max_usage": 100, + "expires_days": 90 + } +) + +invitation = response.json() +print(f"Invitation link: {invitation['invitation_link']}") +print(f"Can be used {invitation['max_usage']} times") +``` + ### Monitoring Script ```python @@ -816,13 +1109,14 @@ if stats['users']['inactive'] > 10: ### Database Queries -Admin operations run synchronous queries in the DB thread pool: +Admin operations run synchronous queries with `db.atomic()`: - User listings: `O(n)` where n = total users -- Repository stats: Aggregation queries -- Commit history: Indexed by repo_full_id and username +- Repository stats: Aggregation queries with indexes +- Commit history: Indexed by repository_id and username +- Storage calculations: Aggregation over File table **Optimization:** -- Limit page size (default: 20, max: 100) +- Limit page size (default: 100, max: 1000) - Use filters to reduce result sets - Statistics are computed on-demand (cache in frontend if needed) @@ -841,21 +1135,36 @@ Admin operations run synchronous queries in the DB thread pool: - Don't scan too frequently - Consider caching results for large buckets +### Bulk Storage Recalculation + +**Performance:** +- Processes repositories sequentially (safe for database) +- Progress logged every 10 repositories +- Can take 1-5 minutes for 1000 repositories +- Errors don't stop the process (logged and returned) + +**Use case:** +- Run during maintenance windows +- Use filters to process subsets +- Monitor logs for progress + --- ## Comparison: Admin Portal vs CLI | Feature | Admin Portal | kohub-cli | Best For | |---------|--------------|-----------|----------| -| User management | ✅ GUI | ✅ Commands | GUI: Quick actions
CLI: Automation | +| User management | ✅ GUI | ❌ No | Portal: Quick actions | | Repository browser | ✅ Full | ⚠️ Limited | Portal: Overview
CLI: Specific repos | | Commit history | ✅ Full | ❌ No | Portal only | | Storage browser | ✅ Full | ❌ No | Portal only | | Quota management | ✅ Full | ⚠️ API only | Portal: Visual
CLI: Scripting | +| Invitation management | ✅ Full | ❌ No | Portal only | | Statistics | ✅ Dashboard | ❌ No | Portal only | +| Bulk operations | ✅ Full | ❌ No | Portal only | | Automation | ❌ Manual | ✅ Scripts | Portal: Manual
CLI: Automation | -**Recommendation:** Use portal for exploration/monitoring, CLI for automation. +**Recommendation:** Use portal for exploration/monitoring, API for automation. --- @@ -879,8 +1188,14 @@ A: Yes, use curl/Python with `X-Admin-Token` header. **Q: Is audit logging enabled by default?** A: Yes, all admin operations are logged with `[ADMIN]` prefix. +**Q: How do I create reusable invitations?** +A: Set `max_usage` to a number (e.g., 50 for 50 uses) or -1 for unlimited. + +**Q: Can invitations auto-add users to organizations?** +A: Yes, set `org_id` and `role` in the invitation. Users will automatically join after registration. + --- **Last Updated:** January 2025 -**Version:** 1.0 +**Version:** 1.1 **Status:** ✅ Production Ready diff --git a/docs/CLI.md b/docs/CLI.md index 855e42d..deb6969 100644 --- a/docs/CLI.md +++ b/docs/CLI.md @@ -16,6 +16,9 @@ kohub-cli repo create REPO_ID --type TYPE # Create repository kohub-cli repo list --type TYPE # List repositories kohub-cli repo ls NAMESPACE # List namespace repos kohub-cli repo files REPO_ID # List repository files +kohub-cli repo commits REPO_ID # List commit history +kohub-cli repo commit REPO_ID COMMIT_ID # Show commit details +kohub-cli repo commit-diff REPO_ID COMMIT # Show commit diff # Organizations kohub-cli org create NAME # Create organization @@ -27,17 +30,20 @@ kohub-cli settings repo move FROM TO --type TYPE # Move/rename repo kohub-cli settings repo squash REPO_ID --type TYPE # Squash repo history kohub-cli settings repo branch create REPO_ID BRANCH # Create branch kohub-cli settings repo tag create REPO_ID TAG # Create tag -kohub-cli settings organization members ORG # List org members + +# LFS Settings (NEW) +kohub-cli settings repo lfs get REPO_ID # Get LFS settings +kohub-cli settings repo lfs threshold REPO_ID --threshold 5000000 # Set threshold +kohub-cli settings repo lfs threshold REPO_ID --reset # Reset to default +kohub-cli settings repo lfs versions REPO_ID --count 10 # Set keep versions +kohub-cli settings repo lfs suffix REPO_ID --add .safetensors --add .bin # Add suffix rules +kohub-cli settings repo lfs suffix REPO_ID --set .safetensors --set .gguf # Set suffix rules +kohub-cli settings repo lfs suffix REPO_ID --clear # Clear suffix rules # File Operations kohub-cli settings repo upload REPO_ID FILE # Upload file to repo kohub-cli settings repo download REPO_ID PATH # Download file from repo -# Commit History -kohub-cli repo commits REPO_ID # List commit history -kohub-cli repo commit REPO_ID COMMIT_ID # Show commit details -kohub-cli repo commit-diff REPO_ID COMMIT_ID # Show commit diff - # Configuration kohub-cli config set KEY VALUE # Set config value kohub-cli config list # Show all config @@ -62,40 +68,6 @@ The KohakuHub CLI (`kohub-cli`) provides comprehensive access to KohakuHub throu This dual-mode design makes it easy to integrate KohakuHub into existing workflows while maintaining compatibility with HuggingFace ecosystem patterns. -## Design Goals - -1. **Python API First**: Expose all functionality through a clean Python API (similar to `huggingface_hub.HfApi`) -2. **CLI as a Wrapper**: Build CLI commands on top of the Python API -3. **Dual Mode**: Support both interactive (TUI) and non-interactive (scripted) modes -4. **Configuration Management**: Easy endpoint and credential management -5. **HuggingFace Compatibility**: Similar patterns and naming conventions where applicable -6. **Extensibility**: Easy to add new features without breaking existing code - -## Architecture - -``` -┌─────────────────────────────────────────────┐ -│ CLI Interface (Click) │ -│ - kohub-cli [command] [options] │ -│ - Interactive TUI mode │ -└──────────────────┬──────────────────────────┘ - | - v -┌─────────────────────────────────────────────┐ -│ Python API (KohubClient) │ -│ - User operations │ -│ - Organization operations │ -│ - Repository operations │ -│ - Token management │ -└──────────────────┬──────────────────────────┘ - | - v -┌─────────────────────────────────────────────┐ -│ HTTP Client (requests.Session) │ -│ - KohakuHub REST API │ -└─────────────────────────────────────────────┘ -``` - ## Python API Design ### Core Class: `KohubClient` @@ -144,45 +116,6 @@ tokens = client.list_tokens() client.revoke_token(token_id=123) ``` -### Organization Operations - -```python -# Create organization -client.create_organization(name="my-org", description="My awesome org") - -# Get organization info -org = client.get_organization("my-org") - -# List user's organizations -orgs = client.list_user_organizations(username="alice") - -# List organization members -members = client.list_organization_members("my-org") - -# Add member to organization -client.add_organization_member( - org_name="my-org", - username="bob", - role="member" # or "admin", "super-admin" -) - -# Update member role -client.update_organization_member( - org_name="my-org", - username="bob", - role="admin" -) - -# Remove member -client.remove_organization_member(org_name="my-org", username="bob") - -# Update organization settings -client.update_organization_settings( - org_name="my-org", - description="Updated description" -) -``` - ### Repository Operations ```python @@ -220,13 +153,32 @@ repos = client.list_namespace_repos( namespace="my-org", repo_type="model" # optional ) +``` +### Repository Settings + +```python # Update repository settings client.update_repo_settings( repo_id="my-org/my-model", repo_type="model", private=True, - gated="auto" # "auto", "manual", or None + gated="auto", # "auto", "manual", or None +) + +# Update LFS settings +client.update_repo_settings( + repo_id="my-org/my-model", + repo_type="model", + lfs_threshold_bytes=5000000, # 5MB + lfs_keep_versions=10, + lfs_suffix_rules=[".safetensors", ".bin", ".gguf"], +) + +# Get LFS settings +settings = client.get_repo_lfs_settings( + repo_id="my-org/my-model", + repo_type="model" ) # Move/rename repository @@ -236,6 +188,16 @@ client.move_repo( repo_type="model" ) +# Squash repository history +client.squash_repo( + repo_id="my-org/my-model", + repo_type="model" +) +``` + +### Branch and Tag Operations + +```python # Create branch client.create_branch( repo_id="my-org/my-model", @@ -266,18 +228,6 @@ client.delete_tag( tag="v1.0", repo_type="model" ) - -# Update user settings -client.update_user_settings( - username="alice", - email="newemail@example.com" -) - -# Squash repository history -client.squash_repo( - repo_id="my-org/my-model", - repo_type="model" -) ``` ### Commit History @@ -333,6 +283,45 @@ local_path = client.download_file( ) ``` +### Organization Operations + +```python +# Create organization +client.create_organization(name="my-org", description="My awesome org") + +# Get organization info +org = client.get_organization("my-org") + +# List user's organizations +orgs = client.list_user_organizations(username="alice") + +# List organization members +members = client.list_organization_members("my-org") + +# Add member to organization +client.add_organization_member( + org_name="my-org", + username="bob", + role="member" # or "admin", "super-admin" +) + +# Update member role +client.update_organization_member( + org_name="my-org", + username="bob", + role="admin" +) + +# Remove member +client.remove_organization_member(org_name="my-org", username="bob") + +# Update organization settings +client.update_organization_settings( + org_name="my-org", + description="Updated description" +) +``` + ### Health Check ```python @@ -359,73 +348,9 @@ config = client.load_config() path = client.config_path # ~/.kohub/config.json ``` -## CLI Design +## CLI Commands Reference -### Command Structure - -``` -kohub-cli -├── auth -│ ├── login # Login with username/password -│ ├── logout # Logout current session -│ ├── whoami # Show current user -│ └── token -│ ├── create # Create new API token -│ ├── list # List all tokens -│ └── delete # Delete a token -├── repo -│ ├── create # Create repository -│ ├── delete # Delete repository -│ ├── info # Show repository info -│ ├── list # List repositories -│ ├── ls # List repositories under a namespace -│ ├── files # List repository files -│ ├── commits # List commit history -│ ├── commit # Show commit details -│ └── commit-diff # Show commit diff -├── org -│ ├── create # Create organization -│ ├── info # Show organization info -│ ├── list # List user's organizations -│ └── member -│ ├── add # Add member to org -│ ├── remove # Remove member from org -│ └── update # Update member role -├── settings -│ ├── user -│ │ └── update # Update user settings -│ ├── repo -│ │ ├── update # Update repository settings -│ │ ├── move # Move/rename repository -│ │ ├── squash # Squash repository history -│ │ ├── upload # Upload file to repository -│ │ ├── download # Download file from repository -│ │ ├── commits # List commit history (alias) -│ │ ├── commit # Show commit details (alias) -│ │ ├── commit-diff # Show commit diff (alias) -│ │ ├── branch -│ │ │ ├── create # Create branch -│ │ │ └── delete # Delete branch -│ │ └── tag -│ │ ├── create # Create tag -│ │ └── delete # Delete tag -│ └── organization -│ ├── update # Update organization settings -│ └── members # List organization members -├── config -│ ├── set # Set configuration value -│ ├── get # Get configuration value -│ ├── list # Show all configuration -│ ├── clear # Clear all configuration -│ ├── history # Show operation history -│ └── clear-history # Clear operation history -├── health # Check service health -└── interactive # Launch interactive TUI mode -``` - -### Command Examples - -#### Authentication +### Authentication ```bash # Login @@ -449,7 +374,7 @@ kohub-cli auth token list kohub-cli auth token delete --id 123 ``` -#### Repository Operations +### Repository Operations ```bash # Create repository @@ -459,7 +384,7 @@ kohub-cli repo create my-org/my-model --type model --private # Delete repository kohub-cli repo delete my-org/my-model --type model -# Show repository info +# Show repository info (shows downloads and likes) kohub-cli repo info my-org/my-model --type model kohub-cli repo info my-org/my-model --type model --revision v1.0 @@ -471,12 +396,27 @@ kohub-cli repo list --type model --author my-org --limit 100 kohub-cli repo ls my-org kohub-cli repo ls my-org --type model -# List files in repository +# List files in repository (with LFS indicators) kohub-cli repo files my-org/my-model kohub-cli repo files my-org/my-model --revision main --path configs/ --recursive ``` -#### Organization Operations +### Commit History + +```bash +# List commits +kohub-cli repo commits my-org/my-model --type model +kohub-cli repo commits my-org/my-model --type model --branch main --limit 50 + +# Show commit details +kohub-cli repo commit my-org/my-model abc1234 --type model + +# Show commit diff +kohub-cli repo commit-diff my-org/my-model abc1234 --type model +kohub-cli repo commit-diff my-org/my-model abc1234 --type model --show-diff +``` + +### Organization Operations ```bash # Create organization @@ -499,12 +439,9 @@ kohub-cli org member update my-org bob --role admin kohub-cli org member remove my-org bob ``` -#### Settings Operations +### Repository Settings ```bash -# Update user settings -kohub-cli settings user update --email newemail@example.com - # Update repository settings kohub-cli settings repo update my-org/my-model --type model --private kohub-cli settings repo update my-org/my-model --type model --public @@ -514,6 +451,13 @@ kohub-cli settings repo update my-org/my-model --type model --gated auto kohub-cli settings repo move my-org/old-name my-org/new-name --type model kohub-cli settings repo move my-user/my-model my-org/my-model --type model +# Squash repository history (WARNING: irreversible) +kohub-cli settings repo squash my-org/my-model --type model +``` + +### Branch and Tag Management + +```bash # Create branch kohub-cli settings repo branch create my-org/my-model dev --type model kohub-cli settings repo branch create my-org/my-model feature-x --type model --revision main @@ -527,15 +471,67 @@ kohub-cli settings repo tag create my-org/my-model v1.0 --type model --revision # Delete tag kohub-cli settings repo tag delete my-org/my-model v1.0 --type model - -# Update organization settings -kohub-cli settings organization update my-org --description "New description" - -# List organization members -kohub-cli settings organization members my-org ``` -#### File Operations +### LFS Settings Management + +**Get current LFS settings:** +```bash +kohub-cli settings repo lfs get my-org/my-model --type model +``` + +**Output:** +``` +LFS Threshold: + Configured: 5.0 MB + Effective: 5.0 MB (repository) + +LFS Keep Versions: + Configured: 10 versions + Effective: 10 versions (repository) + +LFS Suffix Rules: + Active: .safetensors, .bin + +Server Defaults: + Threshold: 10.0 MB + Keep Versions: 5 versions +``` + +**Set LFS threshold:** +```bash +# Set to 5MB +kohub-cli settings repo lfs threshold my-org/my-model --type model --threshold 5000000 + +# Reset to server default +kohub-cli settings repo lfs threshold my-org/my-model --type model --reset +``` + +**Manage keep versions:** +```bash +# Set to keep last 10 versions +kohub-cli settings repo lfs versions my-org/my-model --type model --count 10 + +# Reset to server default +kohub-cli settings repo lfs versions my-org/my-model --type model --reset +``` + +**Manage suffix rules:** +```bash +# Add suffix rules +kohub-cli settings repo lfs suffix my-org/my-model --type model --add .safetensors --add .bin + +# Set suffix rules (replaces all) +kohub-cli settings repo lfs suffix my-org/my-model --type model --set .safetensors --set .gguf + +# Remove specific suffix +kohub-cli settings repo lfs suffix my-org/my-model --type model --remove .bin + +# Clear all suffix rules +kohub-cli settings repo lfs suffix my-org/my-model --type model --clear +``` + +### File Operations ```bash # Upload file to repository @@ -547,32 +543,24 @@ kohub-cli settings repo download my-org/my-model model.safetensors --type model kohub-cli settings repo download my-org/my-model weights/model.bin -o ./local-model.bin --type model --revision v1.0 ``` -#### Commit History +### User Settings ```bash -# List commits -kohub-cli repo commits my-org/my-model --type model -kohub-cli repo commits my-org/my-model --type model --branch main --limit 50 -kohub-cli settings repo commits my-org/my-model --type model --branch dev - -# Show commit details -kohub-cli repo commit my-org/my-model abc1234 --type model -kohub-cli settings repo commit my-org/my-model abc1234567890 --type model - -# Show commit diff -kohub-cli repo commit-diff my-org/my-model abc1234 --type model -kohub-cli repo commit-diff my-org/my-model abc1234 --type model --show-diff -kohub-cli settings repo commit-diff my-org/my-model abc1234 --type model +# Update user settings +kohub-cli settings user update --email newemail@example.com ``` -#### Repository Squash +### Organization Settings ```bash -# Squash repository history (WARNING: irreversible) -kohub-cli settings repo squash my-org/my-model --type model +# Update organization settings +kohub-cli settings organization update my-org --description "New description" + +# List organization members +kohub-cli settings organization members my-org ``` -#### Configuration +### Configuration ```bash # Set endpoint @@ -598,7 +586,7 @@ kohub-cli config clear-history kohub-cli config clear ``` -#### Health Check +### Health Check ```bash # Check service health @@ -606,7 +594,19 @@ kohub-cli health kohub-cli --output json health ``` -#### Interactive Mode +**Output:** +``` +KohakuHub Health Check + +✓ API: Healthy + Site: KohakuHub + Version: 0.0.1 + Endpoint: http://localhost:28080 + +✓ Auth: Authenticated as alice +``` + +### Interactive Mode ```bash # Launch interactive TUI @@ -706,18 +706,155 @@ kohub-cli # Default: launches interactive mode kohub-cli interactive # Explicit: launches interactive mode ``` -## Dual-Mode Design +## Output Formatting -Both command mode and interactive TUI mode are fully implemented and production-ready: +### Text Mode (Default) -- **Command Mode**: Best for scripting, automation, CI/CD pipelines -- **Interactive Mode**: Best for exploration, learning, and interactive management +Uses Rich library for beautiful terminal output: +- Tables for lists +- Panels for detailed information +- Color-coded badges +- Icons for file types +- Progress indicators -Use whichever mode fits your workflow! +**Examples:** -## Implementation Phases +**Repository info with stats:** +``` +┌─ Model Repository ──────────────────────────────────────┐ +│ org/my-model │ +│ ────────────────────────────────────────────────────────│ +│ Author: org │ +│ Type: model │ +│ Visibility: 🌐 Public │ +│ Created: 2025-01-15T12:00:00Z │ +│ │ +│ Downloads: 1234 │ +│ Likes: 42 │ +└──────────────────────────────────────────────────────────┘ +``` -### Phase 1: Python API (Priority: High) ✅ COMPLETED +**File listing with LFS indicators:** +``` +org/my-model (main) +├── 📁 configs +│ └── 📄 config.json (1.2 KB) +├── 📄 README.md (5.0 KB) +└── 📄 model.safetensors (5.2 GB) (LFS) +``` + +### JSON Mode + +Machine-readable output for scripting: + +```bash +kohub-cli --output json repo info my-org/my-model --type model +``` + +```json +{ + "id": "my-org/my-model", + "author": "my-org", + "private": false, + "downloads": 1234, + "likes": 42, + "createdAt": "2025-01-15T12:00:00Z" +} +``` + +## Advanced Features + +### Operation History Tracking + +The CLI automatically tracks all operations in a history file: + +```bash +# View recent operations +kohub-cli config history --limit 20 +``` + +**Example output:** +``` +┌─ Recent Operations ────────────────────────────────────┐ +│ Time │ Operation │ Details │ +├────────────────────┼──────────────┼────────────────────┤ +│ 2025-01-15 12:30 │ create_repo │ repo=org/model │ +│ 2025-01-15 12:25 │ login │ username=alice │ +│ 2025-01-15 12:20 │ create_token │ name=my-laptop │ +└────────────────────────────────────────────────────────┘ +``` + +### Autocomplete Support + +Future feature: Shell autocomplete for bash/zsh/fish + +## Command Structure Overview + +``` +kohub-cli +├── auth +│ ├── login # Login with username/password +│ ├── logout # Logout current session +│ ├── whoami # Show current user +│ └── token +│ ├── create # Create new API token +│ ├── list # List all tokens +│ └── delete # Delete a token +├── repo +│ ├── create # Create repository +│ ├── delete # Delete repository +│ ├── info # Show repository info (with downloads/likes) +│ ├── list # List repositories +│ ├── ls # List repositories under a namespace +│ ├── files # List repository files (with LFS indicators) +│ ├── commits # List commit history +│ ├── commit # Show commit details +│ └── commit-diff # Show commit diff +├── org +│ ├── create # Create organization +│ ├── info # Show organization info +│ ├── list # List user's organizations +│ └── member +│ ├── add # Add member to org +│ ├── remove # Remove member from org +│ └── update # Update member role +├── settings +│ ├── user +│ │ └── update # Update user settings +│ ├── repo +│ │ ├── update # Update repository settings +│ │ ├── move # Move/rename repository +│ │ ├── squash # Squash repository history +│ │ ├── upload # Upload file to repository +│ │ ├── download # Download file from repository +│ │ ├── branch +│ │ │ ├── create # Create branch +│ │ │ └── delete # Delete branch +│ │ ├── tag +│ │ │ ├── create # Create tag +│ │ │ └── delete # Delete tag +│ │ └── lfs # LFS settings management (NEW) +│ │ ├── get # Get LFS settings +│ │ ├── threshold # Set/reset threshold +│ │ ├── versions # Set/reset keep versions +│ │ └── suffix # Manage suffix rules +│ └── organization +│ ├── update # Update organization settings +│ └── members # List organization members +├── config +│ ├── set # Set configuration value +│ ├── get # Get configuration value +│ ├── list # Show all configuration +│ ├── clear # Clear all configuration +│ ├── history # Show operation history +│ └── clear-history # Clear operation history +├── health # Check service health +└── interactive # Launch interactive TUI mode +``` + +## Implementation Status + +### ✅ Phase 1: Python API (COMPLETED) - [x] Create `KohubClient` class - [x] Implement user operations - [x] Implement token management @@ -725,8 +862,10 @@ Use whichever mode fits your workflow! - [x] Implement repository operations - [x] Add proper error classes - [x] Add configuration management +- [x] File upload/download support +- [x] LFS settings management -### Phase 2: CLI Commands (Priority: High) ✅ COMPLETED +### ✅ Phase 2: CLI Commands (COMPLETED) - [x] Set up Click command structure - [x] Implement `auth` commands - [x] Implement `repo` commands @@ -736,30 +875,47 @@ Use whichever mode fits your workflow! - [x] Add global options support - [x] Add output formatting (JSON/text) - [x] Implement branch/tag management -- [x] Rich output formatting with tables +- [x] Rich output formatting with tables and panels +- [x] LFS settings commands +- [x] Operation history tracking -### Phase 3: Enhanced Features (Priority: Medium) -- [ ] Bash/Zsh/Fish completion scripts -- [ ] Progress bars for long operations +### ✅ Phase 3: Enhanced Features (COMPLETED) - [x] Rich output formatting with tables - [x] Operation history tracking -- [ ] Configuration wizard -- [ ] Batch operations support - -### Phase 4: Additional Features ✅ COMPLETED +- [x] Health check command - [x] File upload/download commands - [x] Commit history viewing -- [x] Health check command - [x] Repository squash command - [x] Config history management +- [x] LFS settings management +- [x] Pretty-printed displays -### Phase 5: Future Enhancements (Priority: Low) +### 📋 Phase 4: Future Enhancements +- [ ] Bash/Zsh/Fish completion scripts +- [ ] Progress bars for long operations +- [ ] Configuration wizard +- [ ] Batch operations support - [ ] Plugin system - [ ] Alias support - [ ] History undo functionality -- [ ] Deep git integration (beyond current Git LFS support) -## Testing Strategy +## Dual-Mode Design + +Both command mode and interactive TUI mode are fully implemented and production-ready: + +- **Command Mode**: Best for scripting, automation, CI/CD pipelines +- **Interactive Mode**: Best for exploration, learning, and interactive management + +Use whichever mode fits your workflow! + +## Success Metrics + +1. **Usability**: 90% of operations possible via CLI without interactive mode ✅ +2. **API Coverage**: 100% of HTTP endpoints wrapped in Python API ✅ +3. **Documentation**: Every function/command has examples ✅ +4. **Performance**: CLI commands respond in <1s for metadata operations ✅ + +## Testing ### Unit Tests ```python @@ -779,21 +935,6 @@ kohub-cli repo info test-repo --type model kohub-cli repo delete test-repo --type model ``` -## Documentation Requirements - -1. **API Reference** - Auto-generated from docstrings -2. **CLI Reference** - Auto-generated from Click commands -3. **Tutorials** - Getting started, common workflows -4. **Examples** - Python scripts and shell scripts - -## Success Metrics - -1. **Usability**: 90% of operations possible via CLI without interactive mode -2. **API Coverage**: 100% of HTTP endpoints wrapped in Python API -3. **Documentation**: Every function/command has examples -4. **Testing**: >80% code coverage -5. **Performance**: CLI commands respond in <1s for metadata operations - ## Future Considerations 1. **Async Support**: `AsyncKohubClient` for async Python applications diff --git a/docs/deployment.md b/docs/deployment.md index b6254da..a515ee3 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -33,6 +33,7 @@ See [scripts/README.md](../scripts/README.md#docker-compose-generator) for detai - Change PostgreSQL password (POSTGRES_PASSWORD) - Change LakeFS secret key (LAKEFS_AUTH_ENCRYPT_SECRET_KEY) - Change session secret (KOHAKU_HUB_SESSION_SECRET) + - Change admin token (KOHAKU_HUB_ADMIN_SECRET_TOKEN) - Update BASE_URL if deploying to a domain #### Build and Start @@ -41,7 +42,9 @@ After configuration (either option): ```bash npm install --prefix ./src/kohaku-hub-ui +npm install --prefix ./src/kohaku-hub-admin npm run build --prefix ./src/kohaku-hub-ui +npm run build --prefix ./src/kohaku-hub-admin docker-compose up -d --build ``` @@ -67,23 +70,24 @@ docker-compose up -d --build ```mermaid graph LR - subgraph "Nginx (Port 28080)" - direction TB - Router[Request Router] - Static[Static Files Handler] - Proxy[API Proxy] - end + Client[Client
Browser/CLI/Git] -->|Port 28080| Nginx[Nginx Container
hub-ui:80] - Client[Client] -->|Request| Router - Router -->|"/", "/*.html", "/*.js"| Static - Router -->|"/api/*"| Proxy - Router -->|"/org/*"| Proxy - Router -->|"/{ns}/{repo}.git/*"| Proxy - Router -->|"/resolve/*"| Proxy + Nginx -->|"Static Files
(/, *.html, *.js)"| Static[Vue Frontend
Static Files] + Nginx -->|"/api/*"| API[FastAPI Container
hub-api:48888] + Nginx -->|"/org/*"| API + Nginx -->|"/{ns}/{repo}.git/*"| API + Nginx -->|"/resolve/*"| API + Nginx -->|"/admin/*"| API - Static -->|Serve| Vue[Vue 3 Frontend] - Proxy -->|Forward| FastAPI["FastAPI:48888"] + API -->|REST API| LakeFS[LakeFS Container
lakefs:28000] + API -->|S3 API| MinIO[MinIO Container
minio:9000] + API -->|SQL| Postgres[PostgreSQL Container
postgres:5432] + LakeFS -->|Store Objects| MinIO + + Static -->|Response| Client + API -->|JSON Response| Nginx + Nginx -->|Response| Client ``` **Nginx routing rules:** @@ -131,36 +135,6 @@ os.environ["HF_ENDPOINT"] = "http://localhost:48888" # Don't use backend port d ## Architecture Diagram -```mermaid -graph TB - subgraph "External Access" - Client["Client
(Browser, Git, Python SDK, CLI)"] - end - - subgraph "Nginx Container (hub-ui)
Port 28080" - Nginx["Nginx Reverse Proxy
- Static files: Vue 3 frontend
- Proxy: /api, /org, resolve"] - end - - subgraph "FastAPI Container (hub-api)
Port 48888 (internal)" - FastAPI["FastAPI Application
- HF-compatible REST API
- Git Smart HTTP
- LFS protocol
- Authentication"] - end - - subgraph "Storage Layer" - LakeFS["LakeFS Container
Port 28000 (admin)
- Git-like versioning
- Branch management
- Commit history"] - MinIO["MinIO Container
Port 29000 (console)
Port 29001 (S3 API)
- S3-compatible storage
- Object storage"] - Postgres["PostgreSQL Container
Port 25432 (optional)
- User data
- Metadata
- Quotas"] - end - - Client -->|HTTPS/HTTP| Nginx - Nginx -->|Static| Client - Nginx -->|Proxy API| FastAPI - FastAPI -->|REST API| LakeFS - FastAPI -->|SQL| Postgres - FastAPI -->|S3 API| MinIO - LakeFS -->|Store objects| MinIO - -``` - **Port Mapping:** - **28080** - Public entry point (Nginx) - **48888** - Internal FastAPI (not exposed) @@ -222,8 +196,6 @@ KohakuHub supports horizontal scaling with multiple worker processes. - Simpler code without async/await complexity - Better compatibility with multi-worker setups -**Future:** Migration to peewee-async is planned for improved concurrency. - ### Running Multi-Worker **Development/Testing:** @@ -319,57 +291,6 @@ os.environ["HF_ENDPOINT"] = "http://localhost:48888" os.environ["HF_ENDPOINT"] = "http://localhost:28080" ``` -## Data Flow Examples - -### Upload Flow (with LFS) - -```mermaid -sequenceDiagram - participant User - participant Nginx - participant FastAPI - participant LakeFS - participant MinIO - - User->>Nginx: POST /api/models/org/model/commit/main - Nginx->>FastAPI: Forward request - FastAPI->>FastAPI: Parse NDJSON (header + files + lfsFiles) - - alt Small File (<5MB) - FastAPI->>LakeFS: Upload object (base64 decoded) - LakeFS->>MinIO: Store object - else Large File (>5MB) - Note over FastAPI,MinIO: File already uploaded via presigned URL - FastAPI->>LakeFS: Link physical address - end - - FastAPI->>LakeFS: Commit with message - LakeFS-->>FastAPI: Commit ID - FastAPI-->>Nginx: 200 OK + commit URL - Nginx-->>User: Commit successful -``` - -### Download Flow (Direct S3) - -```mermaid -sequenceDiagram - participant User - participant Nginx - participant FastAPI - participant LakeFS - participant MinIO - - User->>Nginx: GET /org/model/resolve/main/model.safetensors - Nginx->>FastAPI: Forward request - FastAPI->>LakeFS: Stat object (get metadata) - LakeFS-->>FastAPI: Physical address + SHA256 - FastAPI->>MinIO: Generate presigned URL (1 hour) - FastAPI-->>Nginx: 302 Redirect - Nginx-->>User: Redirect to presigned URL - User->>MinIO: Direct download - MinIO-->>User: File content -``` - ## Why This Architecture? 1. **Single Entry Point:** Users only need to know one port (28080) diff --git a/docs/ports.md b/docs/ports.md index 6305452..8249cda 100644 --- a/docs/ports.md +++ b/docs/ports.md @@ -28,6 +28,40 @@ - **29001** - MinIO S3 API - **25432** - PostgreSQL (if exposed) +## Port Architecture Diagram + +```mermaid +graph TB + subgraph External["External Access"] + Client[Client Requests] + end + + subgraph Entry["Entry Point - Port 28080"] + Nginx[Nginx Reverse Proxy
hub-ui:80] + end + + subgraph Application["Application Layer - Port 48888"] + FastAPI[FastAPI Backend
hub-api:48888
INTERNAL ONLY] + end + + subgraph Storage["Storage Services - Internal Ports"] + LakeFS[LakeFS
:28000
Admin UI] + MinIO[MinIO
:9000 S3 API
:29001 Public
:29000 Console] + Postgres[PostgreSQL
:5432
Optional :25432 External] + end + + Client -->|HTTP/HTTPS
:28080| Nginx + Nginx -->|Static Files| Client + Nginx -->|API Proxy
:48888| FastAPI + FastAPI -->|Response| Nginx + Nginx -->|Response| Client + + FastAPI -->|REST API
:28000| LakeFS + FastAPI -->|S3 API
:9000| MinIO + FastAPI -->|SQL
:5432| Postgres + LakeFS -->|Objects
:9000| MinIO +``` + ## Configuration Examples ### Python Client diff --git a/docs/setup.md b/docs/setup.md index 0b1d467..fece484 100644 --- a/docs/setup.md +++ b/docs/setup.md @@ -4,18 +4,6 @@ ## Quick Start -```mermaid -graph LR - Start[Start] --> Clone[Clone Repository] - Clone --> Config[Configure
docker-compose.yml] - Config --> Build[Build Frontend] - Build --> Deploy[Start Docker] - Deploy --> Verify[Verify Installation] - Verify --> CreateUser[Create First User] - CreateUser --> Done[Ready!] - -``` - ### 1. Clone Repository ```bash @@ -23,15 +11,11 @@ git clone https://github.com/KohakuBlueleaf/KohakuHub.git cd KohakuHub ``` -### 2. Copy Configuration +### 2. Configure Docker Compose -```bash -cp docker-compose.example.yml docker-compose.yml -``` +Choose one of the following methods: -**Important:** The repository only includes `docker-compose.example.yml` as a template. You must copy it to `docker-compose.yml` and customize it for your deployment. - -**Alternative:** Use the interactive generator: +**Option A: Interactive Generator (Recommended)** ```bash python scripts/generate_docker_compose.py ``` @@ -42,47 +26,19 @@ The generator will guide you through: - S3 storage (MinIO vs external) - Security key generation -### 2. Customize Configuration - -**Edit `docker-compose.yml` and change these critical settings:** - -#### ⚠️ Security (MUST CHANGE) - -```yaml -# MinIO (Object Storage) -environment: - - MINIO_ROOT_USER=your_secure_username # Change from 'minioadmin' - - MINIO_ROOT_PASSWORD=your_secure_password # Change from 'minioadmin' - -# PostgreSQL (Database) -environment: - - POSTGRES_PASSWORD=your_secure_db_password # Change from 'hubpass' - -# LakeFS (Version Control) -environment: - - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=generate_random_32_char_key_here # Change! - -# KohakuHub API -environment: - - KOHAKU_HUB_SESSION_SECRET=generate_random_string_here # Change! -``` - -#### 🌐 Deployment URL (Optional) - -If deploying to a server with a domain name: - -```yaml -# KohakuHub API -environment: - - KOHAKU_HUB_BASE_URL=https://your-domain.com # Change from localhost - - KOHAKU_HUB_S3_PUBLIC_ENDPOINT=https://s3.your-domain.com # For downloads +**Option B: Manual Configuration** +```bash +cp docker-compose.example.yml docker-compose.yml +# Edit docker-compose.yml and change all security settings ``` ### 3. Build Frontend ```bash npm install --prefix ./src/kohaku-hub-ui +npm install --prefix ./src/kohaku-hub-admin npm run build --prefix ./src/kohaku-hub-ui +npm run build --prefix ./src/kohaku-hub-admin ``` ### 4. Start Services @@ -104,60 +60,38 @@ docker-compose logs -f hub-api ### 6. Access KohakuHub - **Web UI & API:** http://localhost:28080 -- **API Docs:** http://localhost:48888/docs (optional, for development) +- **Admin Portal:** http://localhost:28080/admin +- **API Docs (Swagger):** http://localhost:48888/docs -## Configuration Reference +## Required Configuration Changes -```mermaid -graph TD - subgraph "Security Settings (MUST CHANGE)" - MinIO["MinIO Credentials
MINIO_ROOT_USER
MINIO_ROOT_PASSWORD"] - Postgres["PostgreSQL Password
POSTGRES_PASSWORD"] - LakeFS["LakeFS Encryption Key
LAKEFS_AUTH_ENCRYPT_SECRET_KEY"] - Session["Session Secret
KOHAKU_HUB_SESSION_SECRET"] - Admin["Admin Token
KOHAKU_HUB_ADMIN_SECRET_TOKEN"] - end - - subgraph "Optional Settings" - BaseURL["Base URL
KOHAKU_HUB_BASE_URL"] - S3Public["S3 Public Endpoint
KOHAKU_HUB_S3_PUBLIC_ENDPOINT"] - LFSThreshold["LFS Threshold
KOHAKU_HUB_LFS_THRESHOLD_BYTES"] - Email["Email Verification
KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION"] - end - - Deploy[Deploy] --> Security - Security --> Optional - Optional --> Production[Production Ready] - -``` - -### Required Changes +**IMPORTANT:** You must change these security values before production deployment: | Variable | Default | Change To | Why | |----------|---------|-----------|-----| -| `MINIO_ROOT_USER` | minioadmin | your_username | Security | -| `MINIO_ROOT_PASSWORD` | minioadmin | strong_password | Security | -| `POSTGRES_PASSWORD` | hubpass | strong_password | Security | -| `LAKEFS_AUTH_ENCRYPT_SECRET_KEY` | change_this | random_32_chars | Security | -| `KOHAKU_HUB_SESSION_SECRET` | change_this | random_string | Security | +| `MINIO_ROOT_USER` | minioadmin | your_username | S3 storage security | +| `MINIO_ROOT_PASSWORD` | minioadmin | strong_password | S3 storage security | +| `POSTGRES_PASSWORD` | hubpass | strong_password | Database security | +| `LAKEFS_AUTH_ENCRYPT_SECRET_KEY` | change_this | random_32_chars | LakeFS encryption | +| `KOHAKU_HUB_SESSION_SECRET` | change_this | random_string | Session security | | `KOHAKU_HUB_ADMIN_SECRET_TOKEN` | change_this | random_string | Admin portal access | -**Generate secure values:** +**Generate Secure Values:** ```bash -# Generate 32-character hex key +# Generate 32-character hex key (for LAKEFS_AUTH_ENCRYPT_SECRET_KEY) openssl rand -hex 32 -# Generate 64-character random string +# Generate 64-character random string (for SESSION_SECRET and ADMIN_TOKEN) openssl rand -base64 48 ``` -### Optional Changes +## Optional Configuration | Variable | Default | When to Change | |----------|---------|----------------| | `KOHAKU_HUB_BASE_URL` | http://localhost:28080 | Deploying to domain | | `KOHAKU_HUB_S3_PUBLIC_ENDPOINT` | http://localhost:29001 | Using external S3 | -| `KOHAKU_HUB_LFS_THRESHOLD_BYTES` | 5242880 (5MB) | Adjust LFS threshold | +| `KOHAKU_HUB_LFS_THRESHOLD_BYTES` | 10000000 (10MB) | Adjust LFS threshold | | `KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION` | false | Enable email verification | | `KOHAKU_HUB_LFS_KEEP_VERSIONS` | 5 | Change version retention | | `KOHAKU_HUB_LFS_AUTO_GC` | false | Enable auto garbage collection | @@ -234,7 +168,7 @@ curl http://localhost:28080/api/version ### Cannot Access from External Network -**If deploying on a server:** +If deploying on a server: 1. Update `KOHAKU_HUB_BASE_URL` to your domain 2. Update `KOHAKU_HUB_S3_PUBLIC_ENDPOINT` if using external S3 @@ -260,6 +194,8 @@ server { proxy_pass http://localhost:28080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; } } ``` @@ -269,6 +205,7 @@ server { - [ ] Changed all default passwords - [ ] Set strong SESSION_SECRET - [ ] Set strong LAKEFS_AUTH_ENCRYPT_SECRET_KEY +- [ ] Set strong ADMIN_SECRET_TOKEN - [ ] Using HTTPS with valid certificate - [ ] Only port 28080 exposed (or 443 for HTTPS) - [ ] Firewall configured @@ -281,30 +218,17 @@ server { - `hub-storage/` - MinIO object storage (or use S3) - `docker-compose.yml` - Your configuration +**Backup command:** ```bash -# Backup command tar -czf kohakuhub-backup-$(date +%Y%m%d).tar.gz hub-meta/ hub-storage/ docker-compose.yml ``` -## Updating - -### Update KohakuHub - +**Restore:** ```bash -# Pull latest code -git pull - -# Rebuild frontend -npm install --prefix ./src/kohaku-hub-ui -npm run build --prefix ./src/kohaku-hub-ui - -# Restart services -docker-compose down +tar -xzf kohakuhub-backup-YYYYMMDD.tar.gz docker-compose up -d --build ``` -**Note:** Check CHANGELOG for breaking changes before updating. - ## Multi-Worker Deployment For production deployments, running multiple workers improves performance and availability. @@ -317,8 +241,6 @@ KohakuHub uses **synchronous database operations** with Peewee ORM: - PostgreSQL and SQLite handle connection pooling internally - No async database wrappers needed -**Future:** Migration to peewee-async planned for better concurrency. - ### Running with Multiple Workers **Development Testing:** @@ -367,6 +289,27 @@ services: - In-memory caches are per-worker (use Redis for shared cache) - Log output from all workers (use log aggregation) +## Updating + +### Update KohakuHub + +```bash +# Pull latest code +git pull + +# Rebuild frontend +npm install --prefix ./src/kohaku-hub-ui +npm install --prefix ./src/kohaku-hub-admin +npm run build --prefix ./src/kohaku-hub-ui +npm run build --prefix ./src/kohaku-hub-admin + +# Restart services +docker-compose down +docker-compose up -d --build +``` + +**Note:** Check CHANGELOG for breaking changes before updating. + ## Uninstall ```bash diff --git a/src/kohaku-hub-ui/src/pages/docs/index.vue b/src/kohaku-hub-ui/src/pages/docs/index.vue index edba9e4..3a470f4 100644 --- a/src/kohaku-hub-ui/src/pages/docs/index.vue +++ b/src/kohaku-hub-ui/src/pages/docs/index.vue @@ -25,7 +25,6 @@ const docs = [ "Administration interface for managing users, repositories, commits, and storage. Includes quota management, statistics dashboard, and S3 browser.", path: "/docs/admin", icon: "i-carbon-security", - featured: true, }, { title: "Git Clone Support", @@ -33,7 +32,6 @@ const docs = [ "Native Git clone/pull support with automatic LFS integration. Includes user guide, Cloudflare setup, troubleshooting, and pure Python implementation details.", path: "/docs/git", icon: "i-carbon-code", - featured: true, }, { title: "Port Reference",