minor fixes

This commit is contained in:
Kohaku-Blueleaf
2025-10-11 13:03:58 +08:00
parent 2732c3e0fe
commit 1a7100a586
7 changed files with 505 additions and 179 deletions

View File

@@ -1,63 +1,62 @@
# Kohaku Hub API Documentation
*Last Updated: October 2025*
*Last Updated: January 2025*
This document explains how Kohaku Hub's API works, the data flow, and key endpoints.
## System Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Client Request │
│ (huggingface_hub Python) │
└────────────────────────────────┬────────────────────────────────┘
|
v
┌─────────────────────────────────────────────────────────────────┐
FastAPI Layer │
│ (kohakuhub/api/*) │
│ │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ basic │ │ file │ │ lfs │ │ utils │ │
│ │ .py │ │ .py │ │ .py │ │ .py │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────┬────────────────────────────────┘
|
┌────────────┼────────────┐
| | |
v v v
┌─────────────┐ ┌──────────┐ ┌─────────────┐
│ LakeFS │ │ SQLite/ │ │ MinIO │
│ │ │ Postgres │ │ (S3) │
│ Versioning │ │ Metadata │ │ Storage │
│ Branches │ │ Dedup │ │ Objects │
└─────────────┘ └──────────┘ └─────────────┘
```mermaid
graph TB
subgraph "Client Layer"
Client["Client<br/>(huggingface_hub, git, browser)"]
end
subgraph "Entry Point"
Nginx["Nginx (Port 28080)<br/>- Serves static files<br/>- Reverse proxy"]
end
subgraph "Application Layer"
FastAPI["FastAPI (Port 48888)<br/>- Auth & Permissions<br/>- HF-compatible API<br/>- Git Smart HTTP"]
end
subgraph "Storage Backend"
LakeFS["LakeFS<br/>- Git-like versioning<br/>- Branch management<br/>- Commit history"]
DB["PostgreSQL/SQLite<br/>- User data<br/>- Metadata<br/>- Deduplication"]
S3["MinIO/S3<br/>- Object storage<br/>- LFS files<br/>- Presigned URLs"]
end
Client -->|HTTP/Git/LFS| Nginx
Nginx -->|Static files| Client
Nginx -->|/api, /org, resolve| FastAPI
FastAPI -->|REST API| LakeFS
FastAPI -->|Queries| DB
FastAPI -->|Async wrappers| S3
LakeFS -->|Stores objects| S3
```
## Core Concepts
### File Size Thresholds
```
File Size Decision Tree:
```mermaid
graph TD
Start[File Upload] --> Check{File size > 5MB?}
Check -->|No| Regular[Regular Mode]
Check -->|Yes| LFS[LFS Mode]
Regular --> Base64[Base64 in commit payload]
LFS --> Presigned[S3 presigned URL]
Base64 --> FastAPI[FastAPI processes]
Presigned --> Direct[Direct S3 upload]
FastAPI --> LakeFS1[LakeFS stores object]
Direct --> Link[FastAPI links S3 object]
Link --> LakeFS2[LakeFS commit with physical address]
Is file > 10MB?
|
┌───────┴───────┐
| |
NO YES
| |
v v
┌─────────┐ ┌─────────┐
│ Regular │ │ LFS │
│ Mode │ │ Mode │
└─────────┘ └─────────┘
| |
v v
Base64 in S3 Direct
Commit Upload
```
**Note:** The LFS threshold is configurable via `KOHAKU_HUB_LFS_THRESHOLD_BYTES` (default: 5MB = 5,242,880 bytes).
### Storage Layout
```
@@ -131,13 +130,37 @@ See [Git.md](./Git.md) for complete Git clone documentation and implementation d
### Overview
```
┌────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐
│ Client │---->│ Preupload│---->│ Upload │---->│ Commit │
└────────┘ └──────────┘ └─────────┘ └────────┘
User Check if Upload Atomic
Request file exists file(s) commit
(dedup) (S3/inline) (LakeFS)
```mermaid
sequenceDiagram
participant Client
participant API as FastAPI
participant LakeFS
participant S3
Note over Client,S3: Phase 1: Preupload Check
Client->>API: POST /preupload (file hashes & sizes)
API->>API: Check DB for existing SHA256
API-->>Client: Upload mode (regular/lfs) & dedup info
alt Small Files (<5MB)
Note over Client,S3: Phase 2a: Regular Upload
Client->>API: POST /commit (base64 content)
API->>LakeFS: Upload object
LakeFS->>S3: Store object
else Large Files (>=5MB)
Note over Client,S3: Phase 2b: LFS Upload
Client->>API: POST /info/lfs/objects/batch
API->>S3: Generate presigned URL
API-->>Client: Presigned URL
Client->>S3: PUT file (direct upload)
Client->>API: POST /commit (lfsFile entry)
API->>LakeFS: Link physical address
end
Note over Client,S3: Phase 3: Commit
API->>LakeFS: Commit with message
LakeFS-->>API: Commit ID
API-->>Client: Commit URL & OID
```
### Step 1: Preupload Check
@@ -186,16 +209,16 @@ See [Git.md](./Git.md) for complete Git clone documentation and implementation d
```
For each file:
1. Check size:
- ≤ 10MB → "regular"
- > 10MB → "lfs"
- ≤ 5MB → "regular"
- > 5MB → "lfs"
2. Check if exists (deduplication):
- Query DB for matching SHA256 + size
- If match found → shouldIgnore: true
- If no match → shouldIgnore: false
```
### Step 2a: Regular Upload (≤10MB)
### Step 2a: Regular Upload (≤5MB)
Files are sent inline in the commit payload as base64.
@@ -207,7 +230,7 @@ Files are sent inline in the commit payload as base64.
**No separate upload step needed** - proceed directly to Step 3.
### Step 2b: LFS Upload (>10MB)
### Step 2b: LFS Upload (>5MB)
#### Phase 1: Request Upload URLs
@@ -293,8 +316,8 @@ Files are sent inline in the commit payload as base64.
| Key | Description | Usage |
|-----|-------------|-------|
| `header` | Commit metadata | Required, must be first line |
| `file` | Small file (inline base64) | For files ≤ 10MB |
| `lfsFile` | Large file (LFS reference) | For files > 10MB, already uploaded to S3 |
| `file` | Small file (inline base64) | For files ≤ 5MB |
| `lfsFile` | Large file (LFS reference) | For files > 5MB, already uploaded to S3 |
| `deletedFile` | Delete a single file | Remove file from repo |
| `deletedFolder` | Delete folder recursively | Remove all files in folder |
| `copyFile` | Copy file within repo | Duplicate file (deduplication-aware) |
@@ -343,12 +366,28 @@ Files are sent inline in the commit payload as base64.
## Download Workflow
```
┌────────┐ ┌──────────┐ ┌─────────┐
│ Client │────>│ HEAD │────>│ GET │
└────────┘ └──────────┘ └─────────┘
Request Get metadata Download
(size, hash) (redirect)
```mermaid
sequenceDiagram
participant Client
participant API as FastAPI
participant LakeFS
participant S3
Note over Client,S3: Optional: HEAD request for metadata
Client->>API: HEAD /resolve/{revision}/{filename}
API->>LakeFS: Stat object
LakeFS-->>API: Object metadata (SHA256, size)
API-->>Client: Headers (ETag, Content-Length, X-Repo-Commit)
Note over Client,S3: Download: GET request
Client->>API: GET /resolve/{revision}/{filename}
API->>LakeFS: Get object metadata
API->>S3: Generate presigned URL
API-->>Client: 302 Redirect (presigned URL)
Client->>S3: Direct download
S3-->>Client: File content
Note over Client: No proxy - direct S3 download
```
### Step 1: Get Metadata (HEAD)
@@ -563,72 +602,159 @@ Returns all repositories for a user/organization, grouped by type.
## Database Schema
### Repository Table
```
┌──────────────┬──────────────┬─────────────┐
Column │ Type │ Index? │
├──────────────┼──────────────┼─────────────┤
│ id │ INTEGER PK │ Primary │
│ repo_type │ VARCHAR │ Yes │
│ namespace │ VARCHAR │ Yes │
│ name │ VARCHAR │ Yes │
│ full_id │ VARCHAR │ Unique │
│ private │ BOOLEAN │ No │
│ created_at │ TIMESTAMP │ No │
└──────────────┴──────────────┴─────────────┘
```mermaid
erDiagram
USER ||--o{ REPOSITORY : owns
USER ||--o{ SESSION : has
USER ||--o{ TOKEN : has
USER ||--o{ SSHKEY : has
USER }o--o{ ORGANIZATION : member_of
ORGANIZATION ||--o{ REPOSITORY : owns
REPOSITORY ||--o{ FILE : contains
REPOSITORY ||--o{ COMMIT : has
REPOSITORY ||--o{ STAGINGUPLOAD : has
COMMIT ||--o{ LFSOBJECTHISTORY : references
Example:
repo_type: "model"
namespace: "myorg"
name: "mymodel"
full_id: "myorg/mymodel"
USER {
int id PK
string username UK
string email UK
string password_hash
boolean email_verified
boolean is_active
bigint private_quota_bytes
bigint public_quota_bytes
bigint private_used_bytes
bigint public_used_bytes
datetime created_at
}
REPOSITORY {
int id PK
string repo_type
string namespace
string name
string full_id
boolean private
int owner_id FK
datetime created_at
}
FILE {
int id PK
string repo_full_id
string path_in_repo
int size
string sha256
boolean lfs
datetime created_at
datetime updated_at
}
COMMIT {
int id PK
string commit_id
string repo_full_id
string repo_type
string branch
int user_id FK
string username
text message
text description
datetime created_at
}
ORGANIZATION {
int id PK
string name UK
text description
bigint private_quota_bytes
bigint public_quota_bytes
bigint private_used_bytes
bigint public_used_bytes
datetime created_at
}
TOKEN {
int id PK
int user_id FK
string token_hash UK
string name
datetime last_used
datetime created_at
}
SESSION {
int id PK
string session_id UK
int user_id FK
string secret
datetime expires_at
datetime created_at
}
SSHKEY {
int id PK
int user_id FK
string key_type
text public_key
string fingerprint UK
string title
datetime last_used
datetime created_at
}
STAGINGUPLOAD {
int id PK
string repo_full_id
string repo_type
string revision
string path_in_repo
string sha256
int size
string upload_id
string storage_key
boolean lfs
datetime created_at
}
LFSOBJECTHISTORY {
int id PK
string repo_full_id
string path_in_repo
string sha256
int size
string commit_id
datetime created_at
}
```
### File Table (Deduplication)
```
┌──────────────┬──────────────┬─────────────┐
│ Column │ Type │ Index? │
├──────────────┼──────────────┼─────────────┤
│ id │ INTEGER PK │ Primary │
│ repo_full_id │ VARCHAR │ Yes │
│ path_in_repo │ VARCHAR │ Yes │
│ size │ INTEGER │ No │
│ sha256 │ VARCHAR │ Yes │
│ lfs │ BOOLEAN │ No │
│ created_at │ TIMESTAMP │ No │
│ updated_at │ TIMESTAMP │ No │
└──────────────┴──────────────┴─────────────┘
### Key Tables
Unique constraint: (repo_full_id, path_in_repo)
**Repository Table** - Stores repository metadata:
- Unique constraint on `(repo_type, namespace, name)`
- Allows same `full_id` across different `repo_type`
- Example: `model:myorg/mymodel`, `dataset:myorg/mymodel`
Purpose:
- Track file SHA256 hashes for deduplication
- Check if file changed before upload
- Maintain file metadata
```
**File Table** - Deduplication and metadata:
- Unique constraint on `(repo_full_id, path_in_repo)`
- `sha256` indexed for fast deduplication lookups
- `lfs` flag indicates if file uses LFS storage
### StagingUpload Table (Optional)
```
┌──────────────┬──────────────┬─────────────┐
│ Column │ Type │ Index? │
├──────────────┼──────────────┼─────────────┤
│ id │ INTEGER PK │ Primary │
│ repo_full_id │ VARCHAR │ Yes │
│ revision │ VARCHAR │ Yes │
│ path_in_repo │ VARCHAR │ No │
│ sha256 │ VARCHAR │ No │
│ size │ INTEGER │ No │
│ upload_id │ VARCHAR │ No │
│ storage_key │ VARCHAR │ No │
│ lfs │ BOOLEAN │ No │
│ created_at │ TIMESTAMP │ No │
└──────────────┴──────────────┴─────────────┘
**Commit Table** - User commit tracking:
- `commit_id` is LakeFS commit SHA
- Indexed by `(repo_full_id, branch)` for fast queries
- Denormalized `username` for performance
Purpose:
- Track ongoing multipart uploads
- Enable upload resume
- Clean up failed uploads
```
**LFSObjectHistory Table** - LFS garbage collection:
- Tracks which commits reference which LFS objects
- Enables preserving K versions of each file (default: 5)
- Used for auto-cleanup of old LFS objects
**StagingUpload Table** - Multipart upload tracking:
- Tracks ongoing multipart uploads
- Enables upload resume
- Cleans up failed uploads
## LakeFS Integration
@@ -647,15 +773,25 @@ Examples:
### Key Operations
| Operation | LakeFS API | Purpose |
|-----------|------------|---------|
| Create Repo | `repositories.create_repository()` | Initialize new repository |
| Upload Small File | `objects.upload_object()` | Direct content upload |
| Link LFS File | `staging.link_physical_address()` | Link S3 object to LakeFS |
| Commit | `commits.commit()` | Create atomic commit |
| List Files | `objects.list_objects()` | Browse repository |
| Get File Info | `objects.stat_object()` | Get file metadata |
| Delete File | `objects.delete_object()` | Remove file |
**All LakeFS operations use pure async REST API via httpx (no thread pools!):**
| Operation | LakeFS REST Endpoint | KohakuHub Method | Purpose |
|-----------|---------------------|------------------|---------|
| Create Repo | `POST /repositories` | `create_repository()` | Initialize new repository |
| Upload Small File | `POST /repositories/{repo}/branches/{branch}/objects` | `upload_object()` | Direct content upload |
| Link LFS File | `PUT /repositories/{repo}/branches/{branch}/staging/backing` | `link_physical_address()` | Link S3 object to LakeFS |
| Commit | `POST /repositories/{repo}/branches/{branch}/commits` | `commit()` | Create atomic commit |
| List Files | `GET /repositories/{repo}/refs/{ref}/objects/ls` | `list_objects()` | Browse repository |
| Get File Info | `GET /repositories/{repo}/refs/{ref}/objects/stat` | `stat_object()` | Get file metadata |
| Get File Content | `GET /repositories/{repo}/refs/{ref}/objects` | `get_object()` | Download file |
| Delete File | `DELETE /repositories/{repo}/branches/{branch}/objects` | `delete_object()` | Remove file |
| Create Branch | `POST /repositories/{repo}/branches` | `create_branch()` | Create new branch |
| Delete Branch | `DELETE /repositories/{repo}/branches/{branch}` | `delete_branch()` | Delete branch |
| Create Tag | `POST /repositories/{repo}/tags` | `create_tag()` | Create tag |
| Delete Tag | `DELETE /repositories/{repo}/tags/{tag}` | `delete_tag()` | Delete tag |
| Revert | `POST /repositories/{repo}/branches/{branch}/revert` | `revert_branch()` | Revert commit |
| Merge | `POST /repositories/{repo}/refs/{source}/merge/{dest}` | `merge_into_branch()` | Merge branches |
| Hard Reset | `PUT /repositories/{repo}/branches/{branch}/hard_reset` | `hard_reset_branch()` | Reset branch to commit |
### Physical Address Linking
@@ -1210,9 +1346,15 @@ All Downloads:
### Recommended S3 Providers
| Provider | Best For | Pricing Model |
|----------|----------|---------------|
| Cloudflare R2 | High download | Free egress, $0.015/GB storage |
| Wasabi | Archive/backup | $6/TB/month, free egress if download < storage |
| MinIO | Self-hosted | Free (your hardware/bandwidth) |
| AWS S3 | Enterprise | Pay per GB + egress |
| Provider | Best For | Pricing Model | Notes |
|----------|----------|---------------|-------|
| Cloudflare R2 | High download | Free egress, $0.015/GB storage | Best for public datasets |
| Wasabi | Archive/backup | $6/TB/month, free egress* | *if download < storage |
| MinIO | Self-hosted | Free (your hardware/bandwidth) | Full control, privacy |
| AWS S3 | Enterprise | Pay per GB + egress | Most features, expensive egress |
| Backblaze B2 | Budget | $6/TB storage, $0.01/GB egress | Good for mixed workloads |
**Recommendation for KohakuHub:**
- **Development**: MinIO (included in docker-compose)
- **Public Hub**: Cloudflare R2 (free egress saves costs)
- **Private/Enterprise**: Self-hosted MinIO or AWS S3 with VPC endpoints

View File

@@ -2,11 +2,32 @@
*Complete guide to KohakuHub's administration interface*
**Last Updated:** October 2025
**Last Updated:** January 2025
**Access:** http://your-hub.com/admin
---
## Admin Portal Architecture
```mermaid
graph LR
subgraph "Admin Access"
Browser[Browser] -->|X-Admin-Token| Portal[Admin Portal UI]
end
subgraph "Admin API"
Portal -->|REST API| AdminAPI[Admin Endpoints]
end
subgraph "Data Sources"
AdminAPI -->|Queries| DB[PostgreSQL/SQLite]
AdminAPI -->|List Objects| S3[MinIO/S3]
AdminAPI -->|Repository Info| LakeFS[LakeFS]
end
```
---
## Table of Contents
1. [Overview](#overview)
@@ -860,6 +881,6 @@ A: Yes, all admin operations are logged with `[ADMIN]` prefix.
---
**Last Updated:** October 2025
**Last Updated:** January 2025
**Version:** 1.0
**Status:** ✅ Production Ready

View File

@@ -1,6 +1,6 @@
# KohakuHub CLI Design Document
*Last Updated: October 2025*
*Last Updated: January 2025*
## Quick Reference

View File

@@ -2,7 +2,7 @@
*Complete guide covering Git clone operations, LFS integration, and server implementation*
**Last Updated:** October 2025
**Last Updated:** January 2025
**Status:** ✅ Clone/Pull Production Ready | ⚠️ Push In Development
---
@@ -1612,5 +1612,5 @@ This demonstrates how to build a complete Git server using only Python stdlib +
---
**Last Updated:** January 2025
**Version:** 1.0
**Version:** 1.1
**Authors:** KohakuHub Team

View File

@@ -65,13 +65,35 @@ docker-compose up -d --build
**Configuration:** `docker/nginx/default.conf`
Nginx on port 28080:
```mermaid
graph LR
subgraph "Nginx (Port 28080)"
direction TB
Router[Request Router]
Static[Static Files Handler]
Proxy[API Proxy]
end
Client[Client] -->|Request| Router
Router -->|"/", "/*.html", "/*.js"| Static
Router -->|"/api/*"| Proxy
Router -->|"/org/*"| Proxy
Router -->|"/{ns}/{repo}.git/*"| Proxy
Router -->|"/resolve/*"| Proxy
Static -->|Serve| Vue[Vue 3 Frontend]
Proxy -->|Forward| FastAPI["FastAPI:48888"]
```
**Nginx routing rules:**
1. Serves frontend static files from `/usr/share/nginx/html`
2. Proxies API requests to backend:48888:
- `/api/*` → `http://hub-api:48888/api/*`
- `/org/*` → `http://hub-api:48888/org/*`
- `/{namespace}/{name}.git/*` → `http://hub-api:48888/{namespace}/{name}.git/*` (Git Smart HTTP)
- `/{type}s/{namespace}/{name}/resolve/*` → `http://hub-api:48888/{type}s/{namespace}/{name}/resolve/*`
2. Proxies API requests to `hub-api:48888`:
- `/api/*` → API endpoints
- `/org/*` → Organization endpoints
- `/{namespace}/{name}.git/*` → Git Smart HTTP protocol
- `/{type}s/{namespace}/{name}/resolve/*` → File download endpoints
- `/admin/*` → Admin portal (if enabled)
### Client Configuration
@@ -109,36 +131,43 @@ os.environ["HF_ENDPOINT"] = "http://localhost:48888" # Don't use backend port d
## Architecture Diagram
```mermaid
graph TB
subgraph "External Access"
Client["Client<br/>(Browser, Git, Python SDK, CLI)"]
end
subgraph "Nginx Container (hub-ui)<br/>Port 28080"
Nginx["Nginx Reverse Proxy<br/>- Static files: Vue 3 frontend<br/>- Proxy: /api, /org, resolve"]
end
subgraph "FastAPI Container (hub-api)<br/>Port 48888 (internal)"
FastAPI["FastAPI Application<br/>- HF-compatible REST API<br/>- Git Smart HTTP<br/>- LFS protocol<br/>- Authentication"]
end
subgraph "Storage Layer"
LakeFS["LakeFS Container<br/>Port 28000 (admin)<br/>- Git-like versioning<br/>- Branch management<br/>- Commit history"]
MinIO["MinIO Container<br/>Port 29000 (console)<br/>Port 29001 (S3 API)<br/>- S3-compatible storage<br/>- Object storage"]
Postgres["PostgreSQL Container<br/>Port 25432 (optional)<br/>- User data<br/>- Metadata<br/>- Quotas"]
end
Client -->|HTTPS/HTTP| Nginx
Nginx -->|Static| Client
Nginx -->|Proxy API| FastAPI
FastAPI -->|REST API| LakeFS
FastAPI -->|SQL| Postgres
FastAPI -->|S3 API| MinIO
LakeFS -->|Store objects| MinIO
```
┌─────────────────────────────────────────────────────────┐
│ Client Access │
│ (HuggingFace Hub, kohub-cli, Web) │
└────────────────────┬────────────────────────────────────┘
│ Port 28080
┌───────────────────────┐
│ Nginx (hub-ui) │
│ - Serves frontend │
│ - Reverse proxy API │
└───────┬───────────────┘
┌───────┴───────────────┐
│ │
Static Files /api, /org, resolve
(Vue 3 app) │
│ Internal: hub-api:48888
┌────────────────────────┐
│ FastAPI (hub-api) │
│ - HF-compatible API │
└──┬─────────────┬───────┘
│ │
┌────────┴────┐ ┌────┴────────┐
│ LakeFS │ │ MinIO │
│ (version) │ │ (storage) │
└─────────────┘ └─────────────┘
```
**Port Mapping:**
- **28080** - Public entry point (Nginx)
- **48888** - Internal FastAPI (not exposed)
- **28000** - LakeFS admin UI (optional, for admins)
- **29000** - MinIO console (optional, for admins)
- **29001** - MinIO S3 API (internal + public for downloads)
- **25432** - PostgreSQL (optional, for external access)
## Development vs Production
@@ -223,6 +252,57 @@ os.environ["HF_ENDPOINT"] = "http://localhost:48888"
os.environ["HF_ENDPOINT"] = "http://localhost:28080"
```
## Data Flow Examples
### Upload Flow (with LFS)
```mermaid
sequenceDiagram
participant User
participant Nginx
participant FastAPI
participant LakeFS
participant MinIO
User->>Nginx: POST /api/models/org/model/commit/main
Nginx->>FastAPI: Forward request
FastAPI->>FastAPI: Parse NDJSON (header + files + lfsFiles)
alt Small File (<5MB)
FastAPI->>LakeFS: Upload object (base64 decoded)
LakeFS->>MinIO: Store object
else Large File (>5MB)
Note over FastAPI,MinIO: File already uploaded via presigned URL
FastAPI->>LakeFS: Link physical address
end
FastAPI->>LakeFS: Commit with message
LakeFS-->>FastAPI: Commit ID
FastAPI-->>Nginx: 200 OK + commit URL
Nginx-->>User: Commit successful
```
### Download Flow (Direct S3)
```mermaid
sequenceDiagram
participant User
participant Nginx
participant FastAPI
participant LakeFS
participant MinIO
User->>Nginx: GET /org/model/resolve/main/model.safetensors
Nginx->>FastAPI: Forward request
FastAPI->>LakeFS: Stat object (get metadata)
LakeFS-->>FastAPI: Physical address + SHA256
FastAPI->>MinIO: Generate presigned URL (1 hour)
FastAPI-->>Nginx: 302 Redirect
Nginx-->>User: Redirect to presigned URL
User->>MinIO: Direct download
MinIO-->>User: File content
```
## Why This Architecture?
1. **Single Entry Point:** Users only need to know one port (28080)
@@ -231,6 +311,8 @@ os.environ["HF_ENDPOINT"] = "http://localhost:28080"
4. **Static File Serving:** Nginx serves frontend efficiently
5. **Load Balancing:** Can add multiple backend instances behind nginx
6. **Caching:** Nginx can cache static assets
7. **Direct Downloads:** Files downloaded directly from S3, not proxied
8. **Scalability:** Each component can scale independently
## Troubleshooting

View File

@@ -1,7 +1,21 @@
# KohakuHub Setup Guide
*Last Updated: January 2025*
## Quick Start
```mermaid
graph LR
Start[Start] --> Clone[Clone Repository]
Clone --> Config[Configure<br/>docker-compose.yml]
Config --> Build[Build Frontend]
Build --> Deploy[Start Docker]
Deploy --> Verify[Verify Installation]
Verify --> CreateUser[Create First User]
CreateUser --> Done[Ready!]
```
### 1. Clone Repository
```bash
@@ -17,6 +31,17 @@ cp docker-compose.example.yml docker-compose.yml
**Important:** The repository only includes `docker-compose.example.yml` as a template. You must copy it to `docker-compose.yml` and customize it for your deployment.
**Alternative:** Use the interactive generator:
```bash
python scripts/generate_docker_compose.py
```
The generator will guide you through:
- PostgreSQL setup (built-in vs external)
- LakeFS database backend
- S3 storage (MinIO vs external)
- Security key generation
### 2. Customize Configuration
**Edit `docker-compose.yml` and change these critical settings:**
@@ -83,6 +108,29 @@ docker-compose logs -f hub-api
## Configuration Reference
```mermaid
graph TD
subgraph "Security Settings (MUST CHANGE)"
MinIO["MinIO Credentials<br/>MINIO_ROOT_USER<br/>MINIO_ROOT_PASSWORD"]
Postgres["PostgreSQL Password<br/>POSTGRES_PASSWORD"]
LakeFS["LakeFS Encryption Key<br/>LAKEFS_AUTH_ENCRYPT_SECRET_KEY"]
Session["Session Secret<br/>KOHAKU_HUB_SESSION_SECRET"]
Admin["Admin Token<br/>KOHAKU_HUB_ADMIN_SECRET_TOKEN"]
end
subgraph "Optional Settings"
BaseURL["Base URL<br/>KOHAKU_HUB_BASE_URL"]
S3Public["S3 Public Endpoint<br/>KOHAKU_HUB_S3_PUBLIC_ENDPOINT"]
LFSThreshold["LFS Threshold<br/>KOHAKU_HUB_LFS_THRESHOLD_BYTES"]
Email["Email Verification<br/>KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION"]
end
Deploy[Deploy] --> Security
Security --> Optional
Optional --> Production[Production Ready]
```
### Required Changes
| Variable | Default | Change To | Why |
@@ -92,6 +140,16 @@ docker-compose logs -f hub-api
| `POSTGRES_PASSWORD` | hubpass | strong_password | Security |
| `LAKEFS_AUTH_ENCRYPT_SECRET_KEY` | change_this | random_32_chars | Security |
| `KOHAKU_HUB_SESSION_SECRET` | change_this | random_string | Security |
| `KOHAKU_HUB_ADMIN_SECRET_TOKEN` | change_this | random_string | Admin portal access |
**Generate secure values:**
```bash
# Generate 32-character hex key
openssl rand -hex 32
# Generate 64-character random string
openssl rand -base64 48
```
### Optional Changes
@@ -99,8 +157,11 @@ docker-compose logs -f hub-api
|----------|---------|----------------|
| `KOHAKU_HUB_BASE_URL` | http://localhost:28080 | Deploying to domain |
| `KOHAKU_HUB_S3_PUBLIC_ENDPOINT` | http://localhost:29001 | Using external S3 |
| `KOHAKU_HUB_LFS_THRESHOLD_BYTES` | 10000000 (10MB) | Adjust LFS threshold |
| `KOHAKU_HUB_LFS_THRESHOLD_BYTES` | 5242880 (5MB) | Adjust LFS threshold |
| `KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION` | false | Enable email verification |
| `KOHAKU_HUB_LFS_KEEP_VERSIONS` | 5 | Change version retention |
| `KOHAKU_HUB_LFS_AUTO_GC` | false | Enable auto garbage collection |
| `KOHAKU_HUB_ADMIN_ENABLED` | true | Disable admin portal |
## Post-Installation

View File

@@ -10,11 +10,31 @@ declare module 'vue' {
export interface GlobalComponents {
CodeEditor: typeof import('./components/common/CodeEditor.vue')['default']
CodeViewer: typeof import('./components/common/CodeViewer.vue')['default']
ElAlert: typeof import('element-plus/es')['ElAlert']
ElBreadcrumb: typeof import('element-plus/es')['ElBreadcrumb']
ElBreadcrumbItem: typeof import('element-plus/es')['ElBreadcrumbItem']
ElButton: typeof import('element-plus/es')['ElButton']
ElCheckbox: typeof import('element-plus/es')['ElCheckbox']
ElCollapse: typeof import('element-plus/es')['ElCollapse']
ElCollapseItem: typeof import('element-plus/es')['ElCollapseItem']
ElDialog: typeof import('element-plus/es')['ElDialog']
ElDrawer: typeof import('element-plus/es')['ElDrawer']
ElDropdown: typeof import('element-plus/es')['ElDropdown']
ElDropdownItem: typeof import('element-plus/es')['ElDropdownItem']
ElDropdownMenu: typeof import('element-plus/es')['ElDropdownMenu']
ElForm: typeof import('element-plus/es')['ElForm']
ElFormItem: typeof import('element-plus/es')['ElFormItem']
ElIcon: typeof import('element-plus/es')['ElIcon']
ElInput: typeof import('element-plus/es')['ElInput']
ElOption: typeof import('element-plus/es')['ElOption']
ElProgress: typeof import('element-plus/es')['ElProgress']
ElRadio: typeof import('element-plus/es')['ElRadio']
ElRadioButton: typeof import('element-plus/es')['ElRadioButton']
ElRadioGroup: typeof import('element-plus/es')['ElRadioGroup']
ElSelect: typeof import('element-plus/es')['ElSelect']
ElSkeleton: typeof import('element-plus/es')['ElSkeleton']
ElTabPane: typeof import('element-plus/es')['ElTabPane']
ElTabs: typeof import('element-plus/es')['ElTabs']
ElTag: typeof import('element-plus/es')['ElTag']
FileUploader: typeof import('./components/repo/FileUploader.vue')['default']
HelloWorld: typeof import('./components/HelloWorld.vue')['default']