Files
KohakuHub/README.md
Kohaku-Blueleaf be03f2c64b update readme
2025-10-02 23:44:42 +08:00

10 KiB

KohakuHub: Self-Hosted HuggingFace Hub Alternative

Ask DeepWiki

⚠️ Work In Progress - Not Ready for Production

Join our community!: https://discord.gg/xWYrkyvJ2s

KohakuHub is a minimal, self-hosted alternative to HuggingFace Hub that lets you host and version your own models, datasets, and other AI artifacts with full HuggingFace client compatibility.

What is KohakuHub?

KohakuHub provides a simple but functional solution for teams and individuals who want to:

  • Host their own AI models and datasets without relying on external services
  • Maintain version control with Git-like branching and commits via LakeFS
  • Scale storage independently using S3-compatible object storage
  • Keep existing workflows with full huggingface_hub Python client compatibility

Key Features

  • HuggingFace Compatible: Works seamlessly with existing huggingface_hub client code
  • S3-Compatible Storage: Use any S3-compatible backend (MinIO, Cloudflare R2, Wasabi, AWS S3, etc.)
  • Repository Management: Create, list, and delete model/dataset/space repositories
  • File Operations: Upload, download, copy, and delete files with automatic deduplication
  • Large File Support: Handles files of any size with Git LFS protocol
  • Version Control: Git-like branching and commit history via LakeFS
  • Authentication & Authorization: Secure user registration, session management, and API tokens
  • Organization Management: Create organizations and manage member roles
  • CLI Tool: kohub-cli for easy user and organization management
  • 🚧 Web UI: Coming soon (contributions welcome!)

Architecture

KohakuHub combines three powerful technologies:

  • LakeFS: Provides Git-like versioning for your data (branches, commits, diffs)
  • MinIO/S3: Object storage backend for actual file storage
  • PostgreSQL/SQLite: Lightweight metadata database for deduplication and indexing
  • FastAPI: HuggingFace-compatible API layer

See API.md for detailed API documentation and workflow diagrams.

Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.10+ (for testing with huggingface_hub client)
  • [Optional]
    • S3 Storage (MinIO is default one which run with docker compose)
    • SMTP service (For optional email verification)

1. Clone the Repository

git clone https://github.com/KohakuBlueleaf/Kohaku-Hub.git
cd Kohaku-Hub

2. Configure Docker Compose

Before starting, review and customize docker/docker-compose.yml:

Important: Security Configuration

⚠️ Change Default Passwords! The default configuration uses weak credentials. For any serious deployment:

# MinIO credentials - CHANGE THESE!
environment:
  - MINIO_ROOT_USER=your_secure_username
  - MINIO_ROOT_PASSWORD=your_very_secure_password_here

# PostgreSQL credentials - CHANGE THESE!
environment:
  - POSTGRES_USER=hub
  - POSTGRES_PASSWORD=your_secure_db_password
  - POSTGRES_DB=hubdb

# LakeFS encryption key - CHANGE THIS!
environment:
  - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=generate_a_long_random_key_here

Port Configuration

The default setup exposes these ports:

  • 48888 - KohakuHub API (main interface)
  • 28000 - LakeFS Web UI + API
  • 29000 - MinIO Web Console
  • 29001 - MinIO S3 API
  • 25432 - PostgreSQL (optional, for external access)

For production deployment, you should:

  1. Only expose port 48888 (KohakuHub API) to users
  2. Keep other ports internal or behind a firewall
  3. Use a reverse proxy (nginx/traefik) with HTTPS

Public Endpoint Configuration

If deploying on a server, update these environment variables:

environment:
  # Replace with your actual domain/IP
  - KOHAKU_HUB_BASE_URL=https://your-domain.com
  - KOHAKU_HUB_S3_PUBLIC_ENDPOINT=https://s3.your-domain.com

The S3 public endpoint is used for generating download URLs. It should point to wherever your MinIO S3 API is accessible (port 29001 by default).

3. Start the Services

# Set user/group ID for proper permissions
export UID=$(id -u)
export GID=$(id -g)

# Start all services
docker compose up -d --build

Services will start in this order:

  1. MinIO (S3 storage)
  2. PostgreSQL (metadata database)
  3. LakeFS (version control)
  4. KohakuHub API (main application)

4. Verify Installation

Check that all services are running:

docker compose ps

Access the web interfaces:

5. Test with Python Client

# Install the official HuggingFace client
pip install huggingface_hub

# Run the test script
python scripts/test.py

The test script will:

  • Create a test repository
  • Upload files (both small and large)
  • Download them back
  • Verify content integrity

kohub-cli Usage

KohakuHub includes a command-line tool, kohub-cli, to simplify user and organization management.

Installation

The CLI is included in the source code. To run it, first install the dependencies:

pip install -r requirements.txt
pip install -e .

Then run the CLI:

kohub-cli

User Management

The CLI provides an interactive menu for user management.

1. Register a New User

  • Run the CLI and select User Management -> Register.
  • Follow the prompts to enter a username, email, and password.

2. Login

  • Select User Management -> Login.
  • Enter your username and password to create a session.

3. Generate an API Token

  • After logging in, select User Management -> Create Token.
  • Give the token a name (e.g., "my-laptop").
  • The CLI will print a new API token. Save this token securely!

Organization Management

You can also manage organizations and members.

  • Create Organization: Organization Management -> Create Organization
  • Manage Members: Add, remove, or update member roles within an organization.

Using KohakuHub

With Python Client

To interact with your private repositories, you need to provide your API token.

import os
from huggingface_hub import HfApi

# Point to your KohakuHub instance
os.environ["HF_ENDPOINT"] = "http://localhost:48888"
# Provide your API token
os.environ["HF_TOKEN"] = "your_api_token_here"

api = HfApi(
    endpoint=os.environ["HF_ENDPOINT"],
    token=os.environ["HF_TOKEN"]
)

# Create a repository
api.create_repo("my-org/my-model", repo_type="model")

# Upload files
api.upload_file(
    path_or_fileobj="model.safetensors",
    path_in_repo="model.safetensors",
    repo_id="my-org/my-model",
)

# Download files
file = api.hf_hub_download(
    repo_id="my-org/my-model",
    filename="model.safetensors",
)

With hfutils

hfutils: https://github.com/deepghs/hfutils

With hfutils you can also upload your whole folder easily and utilize KohakuHub in any huggingface model loader like transformers or diffusers

export HF_ENDPOINT="https://huggingface.co/"
hfutils download -t model -r KBlueLeaf/EQ-SDXL-VAE -d . -o ./eq-sdxl
export HF_ENDPOINT="http://127.0.0.1:48888/"
export HF_TOKEN="your_api_token_here"
hfutils upload -t model -r KBlueLeaf/EQ-SDXL-VAE -d . -i ./eq-sdxl

Use KohakuHub in transformers and diffusers

You can utilize your model on KohakuHub in transformers and diffusers directly:

import os
os.environ["HF_ENDPOINT"] = "http://127.0.0.1:48888/"
os.environ["HF_TOKEN"] = "your_api_token_here"
from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained("KBlueLeaf/EQ-SDXL-VAE")

Accessing LakeFS Web UI

LakeFS credentials are automatically generated on first startup and stored in:

docker/hub-meta/hub-api/credentials.env
# or other path if you have modified docker-compose.yml

Use these credentials to log into the LakeFS web interface at http://localhost:28000 and browse your repositories.

Configuration Options

For advanced configuration, you can create a config.toml file or use environment variables. See config-example.toml for all available options.

Key settings include:

  • LFS threshold: Files larger than this use Git LFS protocol (default: 10MB)
  • Database backend: Choose between SQLite (default) or PostgreSQL
  • Authentication: Enable email verification, set session expiry, etc.

Project Status & Roadmap

See TODO.md for detailed development status.

Current Status:

  • Core API (upload, download, version control)
  • HuggingFace client compatibility
  • Large file support (Git LFS)
  • Docker deployment
  • Authentication & Authorization
  • Organization Management
  • CLI for administration
  • 🚧 Web user interface

Contributing

We welcome contributions! Especially:

🎨 Web Interface (High Priority!)

We're looking for frontend developers to help build a modern web UI. Preferred stack:

  • Vue 3 + Vite for the framework
  • Tailwind CSS for styling
  • Similar UX to HuggingFace Hub

If you're interested in leading the web UI development, please reach out on Discord!

Other Contributions

  • Bug reports and feature requests via GitHub Issues
  • Code improvements and bug fixes via Pull Requests
  • Documentation improvements
  • Testing and feedback

Join our community on Discord: https://discord.gg/xWYrkyvJ2s

License

Currently licensed under AGPL-3.0. The license may be updated to a more permissive option after initial development is complete.

Acknowledgments

  • HuggingFace for the amazing Hub platform and client library
  • LakeFS for Git-like data versioning
  • MinIO for S3-compatible object storage

Support & Community


Note: This project is in active development. APIs may change, and features may be incomplete. Not recommended for production use yet.