mirror of
https://github.com/KohakuBlueleaf/KohakuHub.git
synced 2026-03-11 17:34:08 -05:00
update doc and Docker related utils
This commit is contained in:
@@ -9,12 +9,14 @@ RUN pip install --no-cache-dir uv
|
||||
WORKDIR /app
|
||||
|
||||
COPY ./pyproject.toml .
|
||||
RUN mkdir -p /app/src/kohakuhub
|
||||
RUN echo "" > /app/src/kohakuhub/__init__.py
|
||||
RUN uv pip install --system -e .
|
||||
|
||||
COPY ./src/kohakuhub ./src/kohakuhub
|
||||
COPY ./scripts ./scripts
|
||||
COPY ./docker/startup.py /app/startup.py
|
||||
RUN chmod +x /app/startup.py
|
||||
|
||||
RUN uv pip install --system -e .
|
||||
|
||||
EXPOSE 48888
|
||||
CMD ["/app/startup.py"]
|
||||
|
||||
30
README.md
30
README.md
@@ -27,6 +27,7 @@ Self-hosted HuggingFace alternative with Git-like versioning for AI models and d
|
||||
|
||||
- **HuggingFace Compatible** - Drop-in replacement for `huggingface_hub`, `hfutils`, `transformers`, `diffusers`
|
||||
- **External Source Fallback** - Browse HuggingFace (or other KohakuHub instances) when repos not found locally
|
||||
- **User External Tokens** - Configure your own tokens for external sources (HuggingFace, etc.) with encrypted storage
|
||||
- **Native Git Clone** - Standard Git operations (clone) with Git LFS support
|
||||
- **Git-Like Versioning** - Branches, commits, tags via LakeFS
|
||||
- **S3 Storage** - Works with MinIO, AWS S3, Cloudflare R2, etc.
|
||||
@@ -199,10 +200,39 @@ KOHAKU_HUB_REQUIRE_EMAIL_VERIFICATION=false
|
||||
# Admin Portal
|
||||
KOHAKU_HUB_ADMIN_ENABLED=true
|
||||
KOHAKU_HUB_ADMIN_SECRET_TOKEN=change-me-in-production
|
||||
|
||||
# External Tokens (for user-specific fallback tokens)
|
||||
KOHAKU_HUB_DATABASE_KEY=$(openssl rand -hex 32) # Required for encryption
|
||||
```
|
||||
|
||||
See [config-example.toml](./config-example.toml) for all options.
|
||||
|
||||
### External Fallback Tokens
|
||||
|
||||
Users can provide their own tokens for external sources (e.g., HuggingFace) to access private repositories:
|
||||
|
||||
**Via Web UI:**
|
||||
1. Go to Settings → External Tokens
|
||||
2. Add your HuggingFace token
|
||||
3. Tokens are encrypted and stored securely
|
||||
|
||||
**Via CLI:**
|
||||
```bash
|
||||
kohub-cli settings user external-tokens add --url https://huggingface.co --token hf_abc123
|
||||
```
|
||||
|
||||
**Via Authorization Header (API/programmatic):**
|
||||
```bash
|
||||
curl -H "Authorization: Bearer my_token|https://huggingface.co,hf_abc123" \
|
||||
http://localhost:28080/api/models/org/model
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
- User tokens override admin-configured tokens
|
||||
- Tokens encrypted at rest using AES-256
|
||||
- Works with session auth, API tokens, and anonymous requests
|
||||
- Automatically used when repos not found locally
|
||||
|
||||
## Development
|
||||
|
||||
**Backend:**
|
||||
|
||||
167
docs/API.md
167
docs/API.md
@@ -513,6 +513,16 @@ erDiagram
|
||||
| `/api/auth/tokens/create` | POST | ✓ | Create new API token |
|
||||
| `/api/auth/tokens/{token_id}` | DELETE | ✓ | Revoke API token |
|
||||
|
||||
### External Token Operations (Fallback System)
|
||||
|
||||
| Endpoint | Method | Auth | Description |
|
||||
|----------|--------|------|-------------|
|
||||
| `/api/fallback-sources/available` | GET | ✗ | List available fallback sources |
|
||||
| `/api/users/{username}/external-tokens` | GET | ✓ | List user's external tokens (masked) |
|
||||
| `/api/users/{username}/external-tokens` | POST | ✓ | Add/update external token |
|
||||
| `/api/users/{username}/external-tokens/{url}` | DELETE | ✓ | Delete external token |
|
||||
| `/api/users/{username}/external-tokens/bulk` | PUT | ✓ | Bulk update external tokens |
|
||||
|
||||
### Organization Operations
|
||||
|
||||
| Endpoint | Method | Auth | Description |
|
||||
@@ -1011,3 +1021,160 @@ KohakuHub implements smart download tracking:
|
||||
- **Development**: MinIO (included in docker-compose)
|
||||
- **Public Hub**: Cloudflare R2 (free egress saves costs)
|
||||
- **Private/Enterprise**: Self-hosted MinIO or AWS S3 with VPC endpoints
|
||||
|
||||
---
|
||||
|
||||
## External Token API (User Fallback Tokens)
|
||||
|
||||
Users can configure their own tokens for external fallback sources to access private repositories.
|
||||
|
||||
### List Available Sources
|
||||
|
||||
**Public endpoint - no authentication required**
|
||||
|
||||
```bash
|
||||
GET /api/fallback-sources/available
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"url": "https://huggingface.co",
|
||||
"name": "HuggingFace",
|
||||
"source_type": "huggingface",
|
||||
"priority": 1
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### List User's External Tokens
|
||||
|
||||
```bash
|
||||
GET /api/users/{username}/external-tokens
|
||||
Authorization: Bearer YOUR_TOKEN
|
||||
```
|
||||
|
||||
**Response (tokens are masked):**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"url": "https://huggingface.co",
|
||||
"token_preview": "hf_a***",
|
||||
"created_at": "2025-01-22T10:30:00Z",
|
||||
"updated_at": "2025-01-22T10:30:00Z"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Add/Update External Token
|
||||
|
||||
```bash
|
||||
POST /api/users/{username}/external-tokens
|
||||
Authorization: Bearer YOUR_TOKEN
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"url": "https://huggingface.co",
|
||||
"token": "hf_abc123xyz"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "External token saved"
|
||||
}
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- If token exists for this URL, it will be updated
|
||||
- Token is encrypted before storage (AES-256)
|
||||
- User can only manage their own tokens
|
||||
|
||||
### Delete External Token
|
||||
|
||||
```bash
|
||||
DELETE /api/users/{username}/external-tokens/https%3A%2F%2Fhuggingface.co
|
||||
Authorization: Bearer YOUR_TOKEN
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "External token deleted"
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** URL must be URL-encoded in path
|
||||
|
||||
### Bulk Update External Tokens
|
||||
|
||||
Replace all external tokens at once:
|
||||
|
||||
```bash
|
||||
PUT /api/users/{username}/external-tokens/bulk
|
||||
Authorization: Bearer YOUR_TOKEN
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"tokens": [
|
||||
{"url": "https://huggingface.co", "token": "hf_abc123"},
|
||||
{"url": "https://other-hub.com", "token": "token456"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "Updated 2 external tokens"
|
||||
}
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Deletes tokens not in the new list
|
||||
- Atomic operation (all or nothing)
|
||||
|
||||
### Using External Tokens in Requests
|
||||
|
||||
**Authorization Header Format:**
|
||||
```
|
||||
Bearer <auth_token>|<url1>,<token1>|<url2>,<token2>...
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
|
||||
1. **API token + external token:**
|
||||
```bash
|
||||
curl -H "Authorization: Bearer my_api_token|https://huggingface.co,hf_abc123" \
|
||||
http://localhost:28080/api/models/org/model
|
||||
```
|
||||
|
||||
2. **Session auth + external token:**
|
||||
```bash
|
||||
# Frontend automatically sends: "Bearer |https://huggingface.co,hf_abc123"
|
||||
```
|
||||
|
||||
3. **Anonymous + external token:**
|
||||
```bash
|
||||
curl -H "Authorization: Bearer |https://huggingface.co,hf_abc123" \
|
||||
http://localhost:28080/api/models/facebook/gpt2
|
||||
```
|
||||
|
||||
**Token Priority:**
|
||||
1. Authorization header tokens (highest - per-request override)
|
||||
2. Database tokens (medium - user preferences)
|
||||
3. Admin tokens (lowest - server defaults)
|
||||
|
||||
**Configuration:**
|
||||
```bash
|
||||
# Required: Encryption key
|
||||
export KOHAKU_HUB_DATABASE_KEY="$(openssl rand -hex 32)"
|
||||
|
||||
# Optional: Require auth for fallback
|
||||
export KOHAKU_HUB_FALLBACK_REQUIRE_AUTH=false # Default: false
|
||||
```
|
||||
|
||||
272
scripts/sync_lakefs_credentials.py
Normal file
272
scripts/sync_lakefs_credentials.py
Normal file
@@ -0,0 +1,272 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Sync LakeFS Credentials to config.toml
|
||||
|
||||
This script reads LakeFS credentials from credentials.env (auto-generated by Docker)
|
||||
and updates config.toml with the correct values.
|
||||
|
||||
Usage:
|
||||
python scripts/sync_lakefs_credentials.py
|
||||
python scripts/sync_lakefs_credentials.py --credentials-path ./custom/path/credentials.env
|
||||
python scripts/sync_lakefs_credentials.py --config ./custom-config.toml
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
import tomllib
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def find_credentials_path_from_docker_compose(docker_compose_path: Path) -> Path | None:
|
||||
"""Find credentials.env path by parsing docker-compose.yml.
|
||||
|
||||
Args:
|
||||
docker_compose_path: Path to docker-compose.yml
|
||||
|
||||
Returns:
|
||||
Path to credentials.env or None if not found
|
||||
"""
|
||||
if not docker_compose_path.exists():
|
||||
return None
|
||||
|
||||
try:
|
||||
with open(docker_compose_path, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
|
||||
# Look for volume mount pattern: ./path/to/dir:/hub-api-creds
|
||||
# Example: - ./hub-meta/hub-api:/hub-api-creds
|
||||
match = re.search(r"- (\.[\w\-/\\]+):/hub-api-creds", content)
|
||||
if match:
|
||||
host_path = match.group(1)
|
||||
# Resolve relative path
|
||||
base_dir = docker_compose_path.parent
|
||||
full_path = (base_dir / host_path / "credentials.env").resolve()
|
||||
return full_path
|
||||
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"⚠ Failed to parse docker-compose.yml: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def read_credentials_env(filepath: Path) -> dict[str, str]:
|
||||
"""Read credentials from credentials.env file.
|
||||
|
||||
Args:
|
||||
filepath: Path to credentials.env
|
||||
|
||||
Returns:
|
||||
Dict of {key: value}
|
||||
"""
|
||||
if not filepath.exists():
|
||||
raise FileNotFoundError(f"Credentials file not found: {filepath}")
|
||||
|
||||
credentials = {}
|
||||
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
|
||||
# Parse KEY=value
|
||||
match = re.match(r"^([A-Z_]+)=(.+)$", line)
|
||||
if match:
|
||||
key, value = match.groups()
|
||||
credentials[key] = value.strip()
|
||||
|
||||
return credentials
|
||||
|
||||
|
||||
def update_config_toml(
|
||||
config_path: Path, lakefs_access_key: str, lakefs_secret_key: str
|
||||
):
|
||||
"""Update config.toml with LakeFS credentials.
|
||||
|
||||
Args:
|
||||
config_path: Path to config.toml
|
||||
lakefs_access_key: LakeFS access key
|
||||
lakefs_secret_key: LakeFS secret key
|
||||
"""
|
||||
if not config_path.exists():
|
||||
raise FileNotFoundError(f"Config file not found: {config_path}")
|
||||
|
||||
# Read existing config
|
||||
try:
|
||||
with open(config_path, "rb") as f:
|
||||
config = tomllib.load(f)
|
||||
except Exception as e:
|
||||
raise ValueError(f"Failed to parse config.toml: {e}")
|
||||
|
||||
# Update lakefs section
|
||||
if "lakefs" not in config:
|
||||
config["lakefs"] = {}
|
||||
|
||||
config["lakefs"]["access_key"] = lakefs_access_key
|
||||
config["lakefs"]["secret_key"] = lakefs_secret_key
|
||||
|
||||
# Write back
|
||||
lines = []
|
||||
|
||||
for section in [
|
||||
"s3",
|
||||
"lakefs",
|
||||
"smtp",
|
||||
"auth",
|
||||
"admin",
|
||||
"app",
|
||||
"quota",
|
||||
"fallback",
|
||||
]:
|
||||
if section not in config:
|
||||
continue
|
||||
|
||||
lines.append(f"[{section}]")
|
||||
|
||||
for key, val in config[section].items():
|
||||
if isinstance(val, bool):
|
||||
lines.append(f"{key} = {str(val).lower()}")
|
||||
elif isinstance(val, int):
|
||||
# Check if it's a large number with underscores
|
||||
if val >= 1000000:
|
||||
# Format with underscores for readability
|
||||
val_str = f"{val:_}"
|
||||
lines.append(f"{key} = {val_str}")
|
||||
else:
|
||||
lines.append(f"{key} = {val}")
|
||||
elif isinstance(val, float):
|
||||
lines.append(f"{key} = {val}")
|
||||
elif isinstance(val, str):
|
||||
lines.append(f'{key} = "{val}"')
|
||||
elif isinstance(val, list):
|
||||
# Format list
|
||||
items = ", ".join(
|
||||
f'"{item}"' if isinstance(item, str) else str(item) for item in val
|
||||
)
|
||||
lines.append(f"{key} = [{items}]")
|
||||
else:
|
||||
lines.append(f'{key} = "{val}"')
|
||||
|
||||
lines.append("") # Blank line after section
|
||||
|
||||
with open(config_path, "w", encoding="utf-8") as f:
|
||||
f.write("\n".join(lines))
|
||||
|
||||
print(f"✓ Updated {config_path}")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Sync LakeFS credentials from credentials.env to config.toml"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--credentials-path",
|
||||
type=Path,
|
||||
help="Path to credentials.env (default: auto-detect from docker-compose.yml)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--config",
|
||||
type=Path,
|
||||
default=Path("config.toml"),
|
||||
help="Path to config.toml (default: config.toml)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--docker-compose",
|
||||
type=Path,
|
||||
default=Path("docker-compose.yml"),
|
||||
help="Path to docker-compose.yml (default: docker-compose.yml)",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("=" * 60)
|
||||
print("LakeFS Credentials Sync Tool")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# Determine credentials path
|
||||
credentials_path = args.credentials_path
|
||||
|
||||
if not credentials_path:
|
||||
# Auto-detect from docker-compose.yml
|
||||
print(f"Auto-detecting credentials path from {args.docker_compose}...")
|
||||
credentials_path = find_credentials_path_from_docker_compose(
|
||||
args.docker_compose
|
||||
)
|
||||
|
||||
if not credentials_path:
|
||||
print("\n✗ Could not auto-detect credentials path from docker-compose.yml")
|
||||
print("\n💡 Trying default path: ./hub-meta/hub-api/credentials.env")
|
||||
credentials_path = Path("hub-meta/hub-api/credentials.env")
|
||||
|
||||
print(f"Credentials file: {credentials_path}")
|
||||
print(f"Config file: {args.config}")
|
||||
print()
|
||||
|
||||
# Check if files exist
|
||||
if not credentials_path.exists():
|
||||
print(f"✗ Credentials file not found: {credentials_path}")
|
||||
print("\n💡 Make sure docker-compose is running and LakeFS has initialized:")
|
||||
print(" docker-compose up -d")
|
||||
print(" # Wait for LakeFS to start and create credentials.env")
|
||||
sys.exit(1)
|
||||
|
||||
if not args.config.exists():
|
||||
print(f"✗ Config file not found: {args.config}")
|
||||
print("\n💡 Generate config.toml first:")
|
||||
print(" python scripts/generate_docker_compose.py")
|
||||
sys.exit(1)
|
||||
|
||||
# Read credentials
|
||||
print("Reading LakeFS credentials...")
|
||||
try:
|
||||
credentials = read_credentials_env(credentials_path)
|
||||
|
||||
lakefs_access_key = credentials.get("KOHAKU_HUB_LAKEFS_ACCESS_KEY")
|
||||
lakefs_secret_key = credentials.get("KOHAKU_HUB_LAKEFS_SECRET_KEY")
|
||||
|
||||
if not lakefs_access_key or not lakefs_secret_key:
|
||||
print("✗ Missing LakeFS credentials in credentials.env")
|
||||
print(f" Found keys: {list(credentials.keys())}")
|
||||
sys.exit(1)
|
||||
|
||||
print(f" ✓ Access Key: {lakefs_access_key}")
|
||||
print(f" ✓ Secret Key: {lakefs_secret_key[:8]}..." + "*" * 20)
|
||||
print()
|
||||
|
||||
except FileNotFoundError as e:
|
||||
print(f"✗ {e}")
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"✗ Failed to read credentials: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# Update config.toml
|
||||
print(f"Updating {args.config}...")
|
||||
try:
|
||||
update_config_toml(args.config, lakefs_access_key, lakefs_secret_key)
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("✓ Sync Complete!")
|
||||
print("=" * 60)
|
||||
print()
|
||||
print("📋 Updated fields:")
|
||||
print(f" • lakefs.access_key = {lakefs_access_key}")
|
||||
print(f" • lakefs.secret_key = {lakefs_secret_key[:8]}***")
|
||||
print()
|
||||
print("💡 Next steps:")
|
||||
print(" 1. Restart dev server if running")
|
||||
print(" 2. Test LakeFS connection: curl http://localhost:28000/_health")
|
||||
print()
|
||||
|
||||
except Exception as e:
|
||||
print(f"✗ Failed to update config.toml: {e}")
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user