mirror of
https://github.com/KohakuBlueleaf/KohakuHub.git
synced 2026-04-28 18:38:17 -05:00
minor fixes
This commit is contained in:
418
docs/API.md
418
docs/API.md
@@ -1,63 +1,62 @@
|
||||
# Kohaku Hub API Documentation
|
||||
|
||||
*Last Updated: October 2025*
|
||||
*Last Updated: January 2025*
|
||||
|
||||
This document explains how Kohaku Hub's API works, the data flow, and key endpoints.
|
||||
|
||||
## System Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Client Request │
|
||||
│ (huggingface_hub Python) │
|
||||
└────────────────────────────────┬────────────────────────────────┘
|
||||
|
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ FastAPI Layer │
|
||||
│ (kohakuhub/api/*) │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ basic │ │ file │ │ lfs │ │ utils │ │
|
||||
│ │ .py │ │ .py │ │ .py │ │ .py │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└────────────────────────────────┬────────────────────────────────┘
|
||||
|
|
||||
┌────────────┼────────────┐
|
||||
| | |
|
||||
v v v
|
||||
┌─────────────┐ ┌──────────┐ ┌─────────────┐
|
||||
│ LakeFS │ │ SQLite/ │ │ MinIO │
|
||||
│ │ │ Postgres │ │ (S3) │
|
||||
│ Versioning │ │ Metadata │ │ Storage │
|
||||
│ Branches │ │ Dedup │ │ Objects │
|
||||
└─────────────┘ └──────────┘ └─────────────┘
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Client Layer"
|
||||
Client["Client<br/>(huggingface_hub, git, browser)"]
|
||||
end
|
||||
|
||||
subgraph "Entry Point"
|
||||
Nginx["Nginx (Port 28080)<br/>- Serves static files<br/>- Reverse proxy"]
|
||||
end
|
||||
|
||||
subgraph "Application Layer"
|
||||
FastAPI["FastAPI (Port 48888)<br/>- Auth & Permissions<br/>- HF-compatible API<br/>- Git Smart HTTP"]
|
||||
end
|
||||
|
||||
subgraph "Storage Backend"
|
||||
LakeFS["LakeFS<br/>- Git-like versioning<br/>- Branch management<br/>- Commit history"]
|
||||
DB["PostgreSQL/SQLite<br/>- User data<br/>- Metadata<br/>- Deduplication"]
|
||||
S3["MinIO/S3<br/>- Object storage<br/>- LFS files<br/>- Presigned URLs"]
|
||||
end
|
||||
|
||||
Client -->|HTTP/Git/LFS| Nginx
|
||||
Nginx -->|Static files| Client
|
||||
Nginx -->|/api, /org, resolve| FastAPI
|
||||
FastAPI -->|REST API| LakeFS
|
||||
FastAPI -->|Queries| DB
|
||||
FastAPI -->|Async wrappers| S3
|
||||
LakeFS -->|Stores objects| S3
|
||||
|
||||
```
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### File Size Thresholds
|
||||
|
||||
```
|
||||
File Size Decision Tree:
|
||||
```mermaid
|
||||
graph TD
|
||||
Start[File Upload] --> Check{File size > 5MB?}
|
||||
Check -->|No| Regular[Regular Mode]
|
||||
Check -->|Yes| LFS[LFS Mode]
|
||||
Regular --> Base64[Base64 in commit payload]
|
||||
LFS --> Presigned[S3 presigned URL]
|
||||
Base64 --> FastAPI[FastAPI processes]
|
||||
Presigned --> Direct[Direct S3 upload]
|
||||
FastAPI --> LakeFS1[LakeFS stores object]
|
||||
Direct --> Link[FastAPI links S3 object]
|
||||
Link --> LakeFS2[LakeFS commit with physical address]
|
||||
|
||||
Is file > 10MB?
|
||||
|
|
||||
┌───────┴───────┐
|
||||
| |
|
||||
NO YES
|
||||
| |
|
||||
v v
|
||||
┌─────────┐ ┌─────────┐
|
||||
│ Regular │ │ LFS │
|
||||
│ Mode │ │ Mode │
|
||||
└─────────┘ └─────────┘
|
||||
| |
|
||||
v v
|
||||
Base64 in S3 Direct
|
||||
Commit Upload
|
||||
```
|
||||
|
||||
**Note:** The LFS threshold is configurable via `KOHAKU_HUB_LFS_THRESHOLD_BYTES` (default: 5MB = 5,242,880 bytes).
|
||||
|
||||
### Storage Layout
|
||||
|
||||
```
|
||||
@@ -131,13 +130,37 @@ See [Git.md](./Git.md) for complete Git clone documentation and implementation d
|
||||
|
||||
### Overview
|
||||
|
||||
```
|
||||
┌────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐
|
||||
│ Client │---->│ Preupload│---->│ Upload │---->│ Commit │
|
||||
└────────┘ └──────────┘ └─────────┘ └────────┘
|
||||
User Check if Upload Atomic
|
||||
Request file exists file(s) commit
|
||||
(dedup) (S3/inline) (LakeFS)
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant API as FastAPI
|
||||
participant LakeFS
|
||||
participant S3
|
||||
|
||||
Note over Client,S3: Phase 1: Preupload Check
|
||||
Client->>API: POST /preupload (file hashes & sizes)
|
||||
API->>API: Check DB for existing SHA256
|
||||
API-->>Client: Upload mode (regular/lfs) & dedup info
|
||||
|
||||
alt Small Files (<5MB)
|
||||
Note over Client,S3: Phase 2a: Regular Upload
|
||||
Client->>API: POST /commit (base64 content)
|
||||
API->>LakeFS: Upload object
|
||||
LakeFS->>S3: Store object
|
||||
else Large Files (>=5MB)
|
||||
Note over Client,S3: Phase 2b: LFS Upload
|
||||
Client->>API: POST /info/lfs/objects/batch
|
||||
API->>S3: Generate presigned URL
|
||||
API-->>Client: Presigned URL
|
||||
Client->>S3: PUT file (direct upload)
|
||||
Client->>API: POST /commit (lfsFile entry)
|
||||
API->>LakeFS: Link physical address
|
||||
end
|
||||
|
||||
Note over Client,S3: Phase 3: Commit
|
||||
API->>LakeFS: Commit with message
|
||||
LakeFS-->>API: Commit ID
|
||||
API-->>Client: Commit URL & OID
|
||||
```
|
||||
|
||||
### Step 1: Preupload Check
|
||||
@@ -186,16 +209,16 @@ See [Git.md](./Git.md) for complete Git clone documentation and implementation d
|
||||
```
|
||||
For each file:
|
||||
1. Check size:
|
||||
- ≤ 10MB → "regular"
|
||||
- > 10MB → "lfs"
|
||||
|
||||
- ≤ 5MB → "regular"
|
||||
- > 5MB → "lfs"
|
||||
|
||||
2. Check if exists (deduplication):
|
||||
- Query DB for matching SHA256 + size
|
||||
- If match found → shouldIgnore: true
|
||||
- If no match → shouldIgnore: false
|
||||
```
|
||||
|
||||
### Step 2a: Regular Upload (≤10MB)
|
||||
### Step 2a: Regular Upload (≤5MB)
|
||||
|
||||
Files are sent inline in the commit payload as base64.
|
||||
|
||||
@@ -207,7 +230,7 @@ Files are sent inline in the commit payload as base64.
|
||||
|
||||
**No separate upload step needed** - proceed directly to Step 3.
|
||||
|
||||
### Step 2b: LFS Upload (>10MB)
|
||||
### Step 2b: LFS Upload (>5MB)
|
||||
|
||||
#### Phase 1: Request Upload URLs
|
||||
|
||||
@@ -293,8 +316,8 @@ Files are sent inline in the commit payload as base64.
|
||||
| Key | Description | Usage |
|
||||
|-----|-------------|-------|
|
||||
| `header` | Commit metadata | Required, must be first line |
|
||||
| `file` | Small file (inline base64) | For files ≤ 10MB |
|
||||
| `lfsFile` | Large file (LFS reference) | For files > 10MB, already uploaded to S3 |
|
||||
| `file` | Small file (inline base64) | For files ≤ 5MB |
|
||||
| `lfsFile` | Large file (LFS reference) | For files > 5MB, already uploaded to S3 |
|
||||
| `deletedFile` | Delete a single file | Remove file from repo |
|
||||
| `deletedFolder` | Delete folder recursively | Remove all files in folder |
|
||||
| `copyFile` | Copy file within repo | Duplicate file (deduplication-aware) |
|
||||
@@ -343,12 +366,28 @@ Files are sent inline in the commit payload as base64.
|
||||
|
||||
## Download Workflow
|
||||
|
||||
```
|
||||
┌────────┐ ┌──────────┐ ┌─────────┐
|
||||
│ Client │────>│ HEAD │────>│ GET │
|
||||
└────────┘ └──────────┘ └─────────┘
|
||||
Request Get metadata Download
|
||||
(size, hash) (redirect)
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant API as FastAPI
|
||||
participant LakeFS
|
||||
participant S3
|
||||
|
||||
Note over Client,S3: Optional: HEAD request for metadata
|
||||
Client->>API: HEAD /resolve/{revision}/{filename}
|
||||
API->>LakeFS: Stat object
|
||||
LakeFS-->>API: Object metadata (SHA256, size)
|
||||
API-->>Client: Headers (ETag, Content-Length, X-Repo-Commit)
|
||||
|
||||
Note over Client,S3: Download: GET request
|
||||
Client->>API: GET /resolve/{revision}/{filename}
|
||||
API->>LakeFS: Get object metadata
|
||||
API->>S3: Generate presigned URL
|
||||
API-->>Client: 302 Redirect (presigned URL)
|
||||
Client->>S3: Direct download
|
||||
S3-->>Client: File content
|
||||
|
||||
Note over Client: No proxy - direct S3 download
|
||||
```
|
||||
|
||||
### Step 1: Get Metadata (HEAD)
|
||||
@@ -563,72 +602,159 @@ Returns all repositories for a user/organization, grouped by type.
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Repository Table
|
||||
```
|
||||
┌──────────────┬──────────────┬─────────────┐
|
||||
│ Column │ Type │ Index? │
|
||||
├──────────────┼──────────────┼─────────────┤
|
||||
│ id │ INTEGER PK │ Primary │
|
||||
│ repo_type │ VARCHAR │ Yes │
|
||||
│ namespace │ VARCHAR │ Yes │
|
||||
│ name │ VARCHAR │ Yes │
|
||||
│ full_id │ VARCHAR │ Unique │
|
||||
│ private │ BOOLEAN │ No │
|
||||
│ created_at │ TIMESTAMP │ No │
|
||||
└──────────────┴──────────────┴─────────────┘
|
||||
```mermaid
|
||||
erDiagram
|
||||
USER ||--o{ REPOSITORY : owns
|
||||
USER ||--o{ SESSION : has
|
||||
USER ||--o{ TOKEN : has
|
||||
USER ||--o{ SSHKEY : has
|
||||
USER }o--o{ ORGANIZATION : member_of
|
||||
ORGANIZATION ||--o{ REPOSITORY : owns
|
||||
REPOSITORY ||--o{ FILE : contains
|
||||
REPOSITORY ||--o{ COMMIT : has
|
||||
REPOSITORY ||--o{ STAGINGUPLOAD : has
|
||||
COMMIT ||--o{ LFSOBJECTHISTORY : references
|
||||
|
||||
Example:
|
||||
repo_type: "model"
|
||||
namespace: "myorg"
|
||||
name: "mymodel"
|
||||
full_id: "myorg/mymodel"
|
||||
USER {
|
||||
int id PK
|
||||
string username UK
|
||||
string email UK
|
||||
string password_hash
|
||||
boolean email_verified
|
||||
boolean is_active
|
||||
bigint private_quota_bytes
|
||||
bigint public_quota_bytes
|
||||
bigint private_used_bytes
|
||||
bigint public_used_bytes
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
REPOSITORY {
|
||||
int id PK
|
||||
string repo_type
|
||||
string namespace
|
||||
string name
|
||||
string full_id
|
||||
boolean private
|
||||
int owner_id FK
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
FILE {
|
||||
int id PK
|
||||
string repo_full_id
|
||||
string path_in_repo
|
||||
int size
|
||||
string sha256
|
||||
boolean lfs
|
||||
datetime created_at
|
||||
datetime updated_at
|
||||
}
|
||||
|
||||
COMMIT {
|
||||
int id PK
|
||||
string commit_id
|
||||
string repo_full_id
|
||||
string repo_type
|
||||
string branch
|
||||
int user_id FK
|
||||
string username
|
||||
text message
|
||||
text description
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
ORGANIZATION {
|
||||
int id PK
|
||||
string name UK
|
||||
text description
|
||||
bigint private_quota_bytes
|
||||
bigint public_quota_bytes
|
||||
bigint private_used_bytes
|
||||
bigint public_used_bytes
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
TOKEN {
|
||||
int id PK
|
||||
int user_id FK
|
||||
string token_hash UK
|
||||
string name
|
||||
datetime last_used
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
SESSION {
|
||||
int id PK
|
||||
string session_id UK
|
||||
int user_id FK
|
||||
string secret
|
||||
datetime expires_at
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
SSHKEY {
|
||||
int id PK
|
||||
int user_id FK
|
||||
string key_type
|
||||
text public_key
|
||||
string fingerprint UK
|
||||
string title
|
||||
datetime last_used
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
STAGINGUPLOAD {
|
||||
int id PK
|
||||
string repo_full_id
|
||||
string repo_type
|
||||
string revision
|
||||
string path_in_repo
|
||||
string sha256
|
||||
int size
|
||||
string upload_id
|
||||
string storage_key
|
||||
boolean lfs
|
||||
datetime created_at
|
||||
}
|
||||
|
||||
LFSOBJECTHISTORY {
|
||||
int id PK
|
||||
string repo_full_id
|
||||
string path_in_repo
|
||||
string sha256
|
||||
int size
|
||||
string commit_id
|
||||
datetime created_at
|
||||
}
|
||||
```
|
||||
|
||||
### File Table (Deduplication)
|
||||
```
|
||||
┌──────────────┬──────────────┬─────────────┐
|
||||
│ Column │ Type │ Index? │
|
||||
├──────────────┼──────────────┼─────────────┤
|
||||
│ id │ INTEGER PK │ Primary │
|
||||
│ repo_full_id │ VARCHAR │ Yes │
|
||||
│ path_in_repo │ VARCHAR │ Yes │
|
||||
│ size │ INTEGER │ No │
|
||||
│ sha256 │ VARCHAR │ Yes │
|
||||
│ lfs │ BOOLEAN │ No │
|
||||
│ created_at │ TIMESTAMP │ No │
|
||||
│ updated_at │ TIMESTAMP │ No │
|
||||
└──────────────┴──────────────┴─────────────┘
|
||||
### Key Tables
|
||||
|
||||
Unique constraint: (repo_full_id, path_in_repo)
|
||||
**Repository Table** - Stores repository metadata:
|
||||
- Unique constraint on `(repo_type, namespace, name)`
|
||||
- Allows same `full_id` across different `repo_type`
|
||||
- Example: `model:myorg/mymodel`, `dataset:myorg/mymodel`
|
||||
|
||||
Purpose:
|
||||
- Track file SHA256 hashes for deduplication
|
||||
- Check if file changed before upload
|
||||
- Maintain file metadata
|
||||
```
|
||||
**File Table** - Deduplication and metadata:
|
||||
- Unique constraint on `(repo_full_id, path_in_repo)`
|
||||
- `sha256` indexed for fast deduplication lookups
|
||||
- `lfs` flag indicates if file uses LFS storage
|
||||
|
||||
### StagingUpload Table (Optional)
|
||||
```
|
||||
┌──────────────┬──────────────┬─────────────┐
|
||||
│ Column │ Type │ Index? │
|
||||
├──────────────┼──────────────┼─────────────┤
|
||||
│ id │ INTEGER PK │ Primary │
|
||||
│ repo_full_id │ VARCHAR │ Yes │
|
||||
│ revision │ VARCHAR │ Yes │
|
||||
│ path_in_repo │ VARCHAR │ No │
|
||||
│ sha256 │ VARCHAR │ No │
|
||||
│ size │ INTEGER │ No │
|
||||
│ upload_id │ VARCHAR │ No │
|
||||
│ storage_key │ VARCHAR │ No │
|
||||
│ lfs │ BOOLEAN │ No │
|
||||
│ created_at │ TIMESTAMP │ No │
|
||||
└──────────────┴──────────────┴─────────────┘
|
||||
**Commit Table** - User commit tracking:
|
||||
- `commit_id` is LakeFS commit SHA
|
||||
- Indexed by `(repo_full_id, branch)` for fast queries
|
||||
- Denormalized `username` for performance
|
||||
|
||||
Purpose:
|
||||
- Track ongoing multipart uploads
|
||||
- Enable upload resume
|
||||
- Clean up failed uploads
|
||||
```
|
||||
**LFSObjectHistory Table** - LFS garbage collection:
|
||||
- Tracks which commits reference which LFS objects
|
||||
- Enables preserving K versions of each file (default: 5)
|
||||
- Used for auto-cleanup of old LFS objects
|
||||
|
||||
**StagingUpload Table** - Multipart upload tracking:
|
||||
- Tracks ongoing multipart uploads
|
||||
- Enables upload resume
|
||||
- Cleans up failed uploads
|
||||
|
||||
## LakeFS Integration
|
||||
|
||||
@@ -647,15 +773,25 @@ Examples:
|
||||
|
||||
### Key Operations
|
||||
|
||||
| Operation | LakeFS API | Purpose |
|
||||
|-----------|------------|---------|
|
||||
| Create Repo | `repositories.create_repository()` | Initialize new repository |
|
||||
| Upload Small File | `objects.upload_object()` | Direct content upload |
|
||||
| Link LFS File | `staging.link_physical_address()` | Link S3 object to LakeFS |
|
||||
| Commit | `commits.commit()` | Create atomic commit |
|
||||
| List Files | `objects.list_objects()` | Browse repository |
|
||||
| Get File Info | `objects.stat_object()` | Get file metadata |
|
||||
| Delete File | `objects.delete_object()` | Remove file |
|
||||
**All LakeFS operations use pure async REST API via httpx (no thread pools!):**
|
||||
|
||||
| Operation | LakeFS REST Endpoint | KohakuHub Method | Purpose |
|
||||
|-----------|---------------------|------------------|---------|
|
||||
| Create Repo | `POST /repositories` | `create_repository()` | Initialize new repository |
|
||||
| Upload Small File | `POST /repositories/{repo}/branches/{branch}/objects` | `upload_object()` | Direct content upload |
|
||||
| Link LFS File | `PUT /repositories/{repo}/branches/{branch}/staging/backing` | `link_physical_address()` | Link S3 object to LakeFS |
|
||||
| Commit | `POST /repositories/{repo}/branches/{branch}/commits` | `commit()` | Create atomic commit |
|
||||
| List Files | `GET /repositories/{repo}/refs/{ref}/objects/ls` | `list_objects()` | Browse repository |
|
||||
| Get File Info | `GET /repositories/{repo}/refs/{ref}/objects/stat` | `stat_object()` | Get file metadata |
|
||||
| Get File Content | `GET /repositories/{repo}/refs/{ref}/objects` | `get_object()` | Download file |
|
||||
| Delete File | `DELETE /repositories/{repo}/branches/{branch}/objects` | `delete_object()` | Remove file |
|
||||
| Create Branch | `POST /repositories/{repo}/branches` | `create_branch()` | Create new branch |
|
||||
| Delete Branch | `DELETE /repositories/{repo}/branches/{branch}` | `delete_branch()` | Delete branch |
|
||||
| Create Tag | `POST /repositories/{repo}/tags` | `create_tag()` | Create tag |
|
||||
| Delete Tag | `DELETE /repositories/{repo}/tags/{tag}` | `delete_tag()` | Delete tag |
|
||||
| Revert | `POST /repositories/{repo}/branches/{branch}/revert` | `revert_branch()` | Revert commit |
|
||||
| Merge | `POST /repositories/{repo}/refs/{source}/merge/{dest}` | `merge_into_branch()` | Merge branches |
|
||||
| Hard Reset | `PUT /repositories/{repo}/branches/{branch}/hard_reset` | `hard_reset_branch()` | Reset branch to commit |
|
||||
|
||||
### Physical Address Linking
|
||||
|
||||
@@ -1210,9 +1346,15 @@ All Downloads:
|
||||
|
||||
### Recommended S3 Providers
|
||||
|
||||
| Provider | Best For | Pricing Model |
|
||||
|----------|----------|---------------|
|
||||
| Cloudflare R2 | High download | Free egress, $0.015/GB storage |
|
||||
| Wasabi | Archive/backup | $6/TB/month, free egress if download < storage |
|
||||
| MinIO | Self-hosted | Free (your hardware/bandwidth) |
|
||||
| AWS S3 | Enterprise | Pay per GB + egress |
|
||||
| Provider | Best For | Pricing Model | Notes |
|
||||
|----------|----------|---------------|-------|
|
||||
| Cloudflare R2 | High download | Free egress, $0.015/GB storage | Best for public datasets |
|
||||
| Wasabi | Archive/backup | $6/TB/month, free egress* | *if download < storage |
|
||||
| MinIO | Self-hosted | Free (your hardware/bandwidth) | Full control, privacy |
|
||||
| AWS S3 | Enterprise | Pay per GB + egress | Most features, expensive egress |
|
||||
| Backblaze B2 | Budget | $6/TB storage, $0.01/GB egress | Good for mixed workloads |
|
||||
|
||||
**Recommendation for KohakuHub:**
|
||||
- **Development**: MinIO (included in docker-compose)
|
||||
- **Public Hub**: Cloudflare R2 (free egress saves costs)
|
||||
- **Private/Enterprise**: Self-hosted MinIO or AWS S3 with VPC endpoints
|
||||
|
||||
@@ -2,11 +2,32 @@
|
||||
|
||||
*Complete guide to KohakuHub's administration interface*
|
||||
|
||||
**Last Updated:** October 2025
|
||||
**Last Updated:** January 2025
|
||||
**Access:** http://your-hub.com/admin
|
||||
|
||||
---
|
||||
|
||||
## Admin Portal Architecture
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Admin Access"
|
||||
Browser[Browser] -->|X-Admin-Token| Portal[Admin Portal UI]
|
||||
end
|
||||
|
||||
subgraph "Admin API"
|
||||
Portal -->|REST API| AdminAPI[Admin Endpoints]
|
||||
end
|
||||
|
||||
subgraph "Data Sources"
|
||||
AdminAPI -->|Queries| DB[PostgreSQL/SQLite]
|
||||
AdminAPI -->|List Objects| S3[MinIO/S3]
|
||||
AdminAPI -->|Repository Info| LakeFS[LakeFS]
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
@@ -860,6 +881,6 @@ A: Yes, all admin operations are logged with `[ADMIN]` prefix.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 2025
|
||||
**Last Updated:** January 2025
|
||||
**Version:** 1.0
|
||||
**Status:** ✅ Production Ready
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# KohakuHub CLI Design Document
|
||||
|
||||
*Last Updated: October 2025*
|
||||
*Last Updated: January 2025*
|
||||
|
||||
## Quick Reference
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
*Complete guide covering Git clone operations, LFS integration, and server implementation*
|
||||
|
||||
**Last Updated:** October 2025
|
||||
**Last Updated:** January 2025
|
||||
**Status:** ✅ Clone/Pull Production Ready | ⚠️ Push In Development
|
||||
|
||||
---
|
||||
@@ -1612,5 +1612,5 @@ This demonstrates how to build a complete Git server using only Python stdlib +
|
||||
---
|
||||
|
||||
**Last Updated:** January 2025
|
||||
**Version:** 1.0
|
||||
**Version:** 1.1
|
||||
**Authors:** KohakuHub Team
|
||||
|
||||
@@ -65,13 +65,35 @@ docker-compose up -d --build
|
||||
|
||||
**Configuration:** `docker/nginx/default.conf`
|
||||
|
||||
Nginx on port 28080:
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Nginx (Port 28080)"
|
||||
direction TB
|
||||
Router[Request Router]
|
||||
Static[Static Files Handler]
|
||||
Proxy[API Proxy]
|
||||
end
|
||||
|
||||
Client[Client] -->|Request| Router
|
||||
Router -->|"/", "/*.html", "/*.js"| Static
|
||||
Router -->|"/api/*"| Proxy
|
||||
Router -->|"/org/*"| Proxy
|
||||
Router -->|"/{ns}/{repo}.git/*"| Proxy
|
||||
Router -->|"/resolve/*"| Proxy
|
||||
|
||||
Static -->|Serve| Vue[Vue 3 Frontend]
|
||||
Proxy -->|Forward| FastAPI["FastAPI:48888"]
|
||||
|
||||
```
|
||||
|
||||
**Nginx routing rules:**
|
||||
1. Serves frontend static files from `/usr/share/nginx/html`
|
||||
2. Proxies API requests to backend:48888:
|
||||
- `/api/*` → `http://hub-api:48888/api/*`
|
||||
- `/org/*` → `http://hub-api:48888/org/*`
|
||||
- `/{namespace}/{name}.git/*` → `http://hub-api:48888/{namespace}/{name}.git/*` (Git Smart HTTP)
|
||||
- `/{type}s/{namespace}/{name}/resolve/*` → `http://hub-api:48888/{type}s/{namespace}/{name}/resolve/*`
|
||||
2. Proxies API requests to `hub-api:48888`:
|
||||
- `/api/*` → API endpoints
|
||||
- `/org/*` → Organization endpoints
|
||||
- `/{namespace}/{name}.git/*` → Git Smart HTTP protocol
|
||||
- `/{type}s/{namespace}/{name}/resolve/*` → File download endpoints
|
||||
- `/admin/*` → Admin portal (if enabled)
|
||||
|
||||
### Client Configuration
|
||||
|
||||
@@ -109,36 +131,43 @@ os.environ["HF_ENDPOINT"] = "http://localhost:48888" # Don't use backend port d
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "External Access"
|
||||
Client["Client<br/>(Browser, Git, Python SDK, CLI)"]
|
||||
end
|
||||
|
||||
subgraph "Nginx Container (hub-ui)<br/>Port 28080"
|
||||
Nginx["Nginx Reverse Proxy<br/>- Static files: Vue 3 frontend<br/>- Proxy: /api, /org, resolve"]
|
||||
end
|
||||
|
||||
subgraph "FastAPI Container (hub-api)<br/>Port 48888 (internal)"
|
||||
FastAPI["FastAPI Application<br/>- HF-compatible REST API<br/>- Git Smart HTTP<br/>- LFS protocol<br/>- Authentication"]
|
||||
end
|
||||
|
||||
subgraph "Storage Layer"
|
||||
LakeFS["LakeFS Container<br/>Port 28000 (admin)<br/>- Git-like versioning<br/>- Branch management<br/>- Commit history"]
|
||||
MinIO["MinIO Container<br/>Port 29000 (console)<br/>Port 29001 (S3 API)<br/>- S3-compatible storage<br/>- Object storage"]
|
||||
Postgres["PostgreSQL Container<br/>Port 25432 (optional)<br/>- User data<br/>- Metadata<br/>- Quotas"]
|
||||
end
|
||||
|
||||
Client -->|HTTPS/HTTP| Nginx
|
||||
Nginx -->|Static| Client
|
||||
Nginx -->|Proxy API| FastAPI
|
||||
FastAPI -->|REST API| LakeFS
|
||||
FastAPI -->|SQL| Postgres
|
||||
FastAPI -->|S3 API| MinIO
|
||||
LakeFS -->|Store objects| MinIO
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Client Access │
|
||||
│ (HuggingFace Hub, kohub-cli, Web) │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
│ Port 28080
|
||||
▼
|
||||
┌───────────────────────┐
|
||||
│ Nginx (hub-ui) │
|
||||
│ - Serves frontend │
|
||||
│ - Reverse proxy API │
|
||||
└───────┬───────────────┘
|
||||
│
|
||||
┌───────┴───────────────┐
|
||||
│ │
|
||||
Static Files /api, /org, resolve
|
||||
(Vue 3 app) │
|
||||
│ Internal: hub-api:48888
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ FastAPI (hub-api) │
|
||||
│ - HF-compatible API │
|
||||
└──┬─────────────┬───────┘
|
||||
│ │
|
||||
┌────────┴────┐ ┌────┴────────┐
|
||||
│ LakeFS │ │ MinIO │
|
||||
│ (version) │ │ (storage) │
|
||||
└─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
**Port Mapping:**
|
||||
- **28080** - Public entry point (Nginx)
|
||||
- **48888** - Internal FastAPI (not exposed)
|
||||
- **28000** - LakeFS admin UI (optional, for admins)
|
||||
- **29000** - MinIO console (optional, for admins)
|
||||
- **29001** - MinIO S3 API (internal + public for downloads)
|
||||
- **25432** - PostgreSQL (optional, for external access)
|
||||
|
||||
## Development vs Production
|
||||
|
||||
@@ -223,6 +252,57 @@ os.environ["HF_ENDPOINT"] = "http://localhost:48888"
|
||||
os.environ["HF_ENDPOINT"] = "http://localhost:28080"
|
||||
```
|
||||
|
||||
## Data Flow Examples
|
||||
|
||||
### Upload Flow (with LFS)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Nginx
|
||||
participant FastAPI
|
||||
participant LakeFS
|
||||
participant MinIO
|
||||
|
||||
User->>Nginx: POST /api/models/org/model/commit/main
|
||||
Nginx->>FastAPI: Forward request
|
||||
FastAPI->>FastAPI: Parse NDJSON (header + files + lfsFiles)
|
||||
|
||||
alt Small File (<5MB)
|
||||
FastAPI->>LakeFS: Upload object (base64 decoded)
|
||||
LakeFS->>MinIO: Store object
|
||||
else Large File (>5MB)
|
||||
Note over FastAPI,MinIO: File already uploaded via presigned URL
|
||||
FastAPI->>LakeFS: Link physical address
|
||||
end
|
||||
|
||||
FastAPI->>LakeFS: Commit with message
|
||||
LakeFS-->>FastAPI: Commit ID
|
||||
FastAPI-->>Nginx: 200 OK + commit URL
|
||||
Nginx-->>User: Commit successful
|
||||
```
|
||||
|
||||
### Download Flow (Direct S3)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Nginx
|
||||
participant FastAPI
|
||||
participant LakeFS
|
||||
participant MinIO
|
||||
|
||||
User->>Nginx: GET /org/model/resolve/main/model.safetensors
|
||||
Nginx->>FastAPI: Forward request
|
||||
FastAPI->>LakeFS: Stat object (get metadata)
|
||||
LakeFS-->>FastAPI: Physical address + SHA256
|
||||
FastAPI->>MinIO: Generate presigned URL (1 hour)
|
||||
FastAPI-->>Nginx: 302 Redirect
|
||||
Nginx-->>User: Redirect to presigned URL
|
||||
User->>MinIO: Direct download
|
||||
MinIO-->>User: File content
|
||||
```
|
||||
|
||||
## Why This Architecture?
|
||||
|
||||
1. **Single Entry Point:** Users only need to know one port (28080)
|
||||
@@ -231,6 +311,8 @@ os.environ["HF_ENDPOINT"] = "http://localhost:28080"
|
||||
4. **Static File Serving:** Nginx serves frontend efficiently
|
||||
5. **Load Balancing:** Can add multiple backend instances behind nginx
|
||||
6. **Caching:** Nginx can cache static assets
|
||||
7. **Direct Downloads:** Files downloaded directly from S3, not proxied
|
||||
8. **Scalability:** Each component can scale independently
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
@@ -1,7 +1,21 @@
|
||||
# KohakuHub Setup Guide
|
||||
|
||||
*Last Updated: January 2025*
|
||||
|
||||
## Quick Start
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
Start[Start] --> Clone[Clone Repository]
|
||||
Clone --> Config[Configure<br/>docker-compose.yml]
|
||||
Config --> Build[Build Frontend]
|
||||
Build --> Deploy[Start Docker]
|
||||
Deploy --> Verify[Verify Installation]
|
||||
Verify --> CreateUser[Create First User]
|
||||
CreateUser --> Done[Ready!]
|
||||
|
||||
```
|
||||
|
||||
### 1. Clone Repository
|
||||
|
||||
```bash
|
||||
@@ -17,6 +31,17 @@ cp docker-compose.example.yml docker-compose.yml
|
||||
|
||||
**Important:** The repository only includes `docker-compose.example.yml` as a template. You must copy it to `docker-compose.yml` and customize it for your deployment.
|
||||
|
||||
**Alternative:** Use the interactive generator:
|
||||
```bash
|
||||
python scripts/generate_docker_compose.py
|
||||
```
|
||||
|
||||
The generator will guide you through:
|
||||
- PostgreSQL setup (built-in vs external)
|
||||
- LakeFS database backend
|
||||
- S3 storage (MinIO vs external)
|
||||
- Security key generation
|
||||
|
||||
### 2. Customize Configuration
|
||||
|
||||
**Edit `docker-compose.yml` and change these critical settings:**
|
||||
@@ -83,6 +108,29 @@ docker-compose logs -f hub-api
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Security Settings (MUST CHANGE)"
|
||||
MinIO["MinIO Credentials<br/>MINIO_ROOT_USER<br/>MINIO_ROOT_PASSWORD"]
|
||||
Postgres["PostgreSQL Password<br/>POSTGRES_PASSWORD"]
|
||||
LakeFS["LakeFS Encryption Key<br/>LAKEFS_AUTH_ENCRYPT_SECRET_KEY"]
|
||||
Session["Session Secret<br/>KOHAKU_HUB_SESSION_SECRET"]
|
||||
Admin["Admin Token<br/>KOHAKU_HUB_ADMIN_SECRET_TOKEN"]
|
||||
end
|
||||
|
||||
subgraph "Optional Settings"
|
||||
BaseURL["Base URL<br/>KOHAKU_HUB_BASE_URL"]
|
||||
S3Public["S3 Public Endpoint<br/>KOHAKU_HUB_S3_PUBLIC_ENDPOINT"]
|
||||
LFSThreshold["LFS Threshold<br/>KOHAKU_HUB_LFS_THRESHOLD_BYTES"]
|
||||
Email["Email Verification<br/>KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION"]
|
||||
end
|
||||
|
||||
Deploy[Deploy] --> Security
|
||||
Security --> Optional
|
||||
Optional --> Production[Production Ready]
|
||||
|
||||
```
|
||||
|
||||
### Required Changes
|
||||
|
||||
| Variable | Default | Change To | Why |
|
||||
@@ -92,6 +140,16 @@ docker-compose logs -f hub-api
|
||||
| `POSTGRES_PASSWORD` | hubpass | strong_password | Security |
|
||||
| `LAKEFS_AUTH_ENCRYPT_SECRET_KEY` | change_this | random_32_chars | Security |
|
||||
| `KOHAKU_HUB_SESSION_SECRET` | change_this | random_string | Security |
|
||||
| `KOHAKU_HUB_ADMIN_SECRET_TOKEN` | change_this | random_string | Admin portal access |
|
||||
|
||||
**Generate secure values:**
|
||||
```bash
|
||||
# Generate 32-character hex key
|
||||
openssl rand -hex 32
|
||||
|
||||
# Generate 64-character random string
|
||||
openssl rand -base64 48
|
||||
```
|
||||
|
||||
### Optional Changes
|
||||
|
||||
@@ -99,8 +157,11 @@ docker-compose logs -f hub-api
|
||||
|----------|---------|----------------|
|
||||
| `KOHAKU_HUB_BASE_URL` | http://localhost:28080 | Deploying to domain |
|
||||
| `KOHAKU_HUB_S3_PUBLIC_ENDPOINT` | http://localhost:29001 | Using external S3 |
|
||||
| `KOHAKU_HUB_LFS_THRESHOLD_BYTES` | 10000000 (10MB) | Adjust LFS threshold |
|
||||
| `KOHAKU_HUB_LFS_THRESHOLD_BYTES` | 5242880 (5MB) | Adjust LFS threshold |
|
||||
| `KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION` | false | Enable email verification |
|
||||
| `KOHAKU_HUB_LFS_KEEP_VERSIONS` | 5 | Change version retention |
|
||||
| `KOHAKU_HUB_LFS_AUTO_GC` | false | Enable auto garbage collection |
|
||||
| `KOHAKU_HUB_ADMIN_ENABLED` | true | Disable admin portal |
|
||||
|
||||
## Post-Installation
|
||||
|
||||
|
||||
20
src/kohaku-hub-ui/src/components.d.ts
vendored
20
src/kohaku-hub-ui/src/components.d.ts
vendored
@@ -10,11 +10,31 @@ declare module 'vue' {
|
||||
export interface GlobalComponents {
|
||||
CodeEditor: typeof import('./components/common/CodeEditor.vue')['default']
|
||||
CodeViewer: typeof import('./components/common/CodeViewer.vue')['default']
|
||||
ElAlert: typeof import('element-plus/es')['ElAlert']
|
||||
ElBreadcrumb: typeof import('element-plus/es')['ElBreadcrumb']
|
||||
ElBreadcrumbItem: typeof import('element-plus/es')['ElBreadcrumbItem']
|
||||
ElButton: typeof import('element-plus/es')['ElButton']
|
||||
ElCheckbox: typeof import('element-plus/es')['ElCheckbox']
|
||||
ElCollapse: typeof import('element-plus/es')['ElCollapse']
|
||||
ElCollapseItem: typeof import('element-plus/es')['ElCollapseItem']
|
||||
ElDialog: typeof import('element-plus/es')['ElDialog']
|
||||
ElDrawer: typeof import('element-plus/es')['ElDrawer']
|
||||
ElDropdown: typeof import('element-plus/es')['ElDropdown']
|
||||
ElDropdownItem: typeof import('element-plus/es')['ElDropdownItem']
|
||||
ElDropdownMenu: typeof import('element-plus/es')['ElDropdownMenu']
|
||||
ElForm: typeof import('element-plus/es')['ElForm']
|
||||
ElFormItem: typeof import('element-plus/es')['ElFormItem']
|
||||
ElIcon: typeof import('element-plus/es')['ElIcon']
|
||||
ElInput: typeof import('element-plus/es')['ElInput']
|
||||
ElOption: typeof import('element-plus/es')['ElOption']
|
||||
ElProgress: typeof import('element-plus/es')['ElProgress']
|
||||
ElRadio: typeof import('element-plus/es')['ElRadio']
|
||||
ElRadioButton: typeof import('element-plus/es')['ElRadioButton']
|
||||
ElRadioGroup: typeof import('element-plus/es')['ElRadioGroup']
|
||||
ElSelect: typeof import('element-plus/es')['ElSelect']
|
||||
ElSkeleton: typeof import('element-plus/es')['ElSkeleton']
|
||||
ElTabPane: typeof import('element-plus/es')['ElTabPane']
|
||||
ElTabs: typeof import('element-plus/es')['ElTabs']
|
||||
ElTag: typeof import('element-plus/es')['ElTag']
|
||||
FileUploader: typeof import('./components/repo/FileUploader.vue')['default']
|
||||
HelloWorld: typeof import('./components/HelloWorld.vue')['default']
|
||||
|
||||
Reference in New Issue
Block a user