# Database Migrations This directory contains database migration scripts for KohakuHub. ## How Migrations Work 1. **Auto-detection**: Each migration checks if it needs to run by verifying if columns/tables exist 2. **Sequential execution**: Migrations run in numerical order (001, 002, 003, etc.) 3. **Idempotent**: Safe to run multiple times - already-applied migrations are skipped 4. **Auto-run**: Migrations automatically run on container startup via `docker/startup.py` 5. **Self-healing**: Each migration automatically checks if ANY future migration has been applied - If migration 005 finds that migration 008 is applied, it skips (changes already included) - Works automatically for any future migrations (009, 010, etc.) - No hardcoding - migrations discover and check future migrations dynamically ## Migration Order | # | Name | Description | Notes | |---|------|-------------|-------| | 001 | repository_schema | Remove unique constraint from Repository.full_id | Skipped if post-008 | | 002 | user_org_quotas | Add private/public quota fields to User/Organization | Skipped if post-008 | | 003 | commit_tracking | Add Commit table for tracking user commits | Skipped if post-008 | | 004 | repo_quotas | Add quota/used_bytes fields to Repository | Skipped if post-008 | | 005 | profiles_and_invitations | Add profile fields and invitation system | Skipped if post-008 | | 006 | invitation_multi_use | Add multi-use support to invitations | Skipped if post-008 | | 007 | avatar_support | Add avatar fields to User/Organization | Skipped if post-008 | | 008 | foreignkey_refactoring | **BREAKING** Merge User/Organization tables + Add ForeignKeys | Major schema change | ## Migration 008 Schema Refactoring **Migration 008 is a major schema refactoring that:** - Merges the Organization table into User table (adds `is_org` flag) - Converts all integer ID references to proper ForeignKey constraints - Adds denormalized owner fields for performance **If you have an existing database:** - Migrations 001-007 will automatically skip (changes already included in 008) - Migration 008 requires confirmation (or set `KOHAKU_HUB_AUTO_MIGRATE=true` in Docker) - **BACKUP YOUR DATABASE BEFORE RUNNING 008** **For fresh/new databases:** - Recreate the database from scratch instead of running migrations: ```bash # Stop services docker-compose down # Remove old database data rm -rf hub-meta/postgres-data/* # Restart (will auto-create schema) docker-compose up -d ``` - Fresh databases get the latest schema automatically via `init_db()` - All migrations will skip (nothing to migrate) ## Creating New Migrations 1. Create a new file: `scripts/db_migrations/00X_name.py` 2. Implement these functions: - `MIGRATION_NUMBER` - Constant with migration number (e.g., 9) - `is_applied(db, cfg)` - Check if THIS migration has been applied (for future migrations to detect) - `check_migration_needed()` - Returns True if migration should run - `migrate_sqlite()` - SQLite migration logic - `migrate_postgres()` - PostgreSQL migration logic - `run()` - Main entry point that uses `should_skip_due_to_future_migrations()` 3. Template: ```python #!/usr/bin/env python3 """Migration 00X: Description""" import sys import os # Add src to path sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "src")) # Add db_migrations to path (for _migration_utils) sys.path.insert(0, os.path.dirname(__file__)) from kohakuhub.db import db from kohakuhub.config import cfg from _migration_utils import should_skip_due_to_future_migrations, check_column_exists, check_table_exists # IMPORTANT: Do NOT import Peewee models (User, Repository, etc.) # Models may be renamed/deleted in future versions, breaking old migrations. # Use raw SQL queries instead. MIGRATION_NUMBER = X # Replace X with actual number def is_applied(db, cfg): """Check if THIS migration has been applied. This function is called by older migrations to detect if this migration has already applied their changes. Choose a unique signature column/table. Returns True if this migration is applied, False otherwise. Errors should return True (treat as applied, skip older migrations). """ # Example: Check if a signature column/table exists return check_column_exists(db, cfg, "mytable", "mycolumn") def check_migration_needed(): """Check if this migration needs to run.""" cursor = db.cursor() if cfg.app.db_backend == "postgres": cursor.execute(""" SELECT column_name FROM information_schema.columns WHERE table_name='mytable' AND column_name='mycolumn' """) return cursor.fetchone() is None else: cursor.execute("PRAGMA table_info(mytable)") columns = [row[1] for row in cursor.fetchall()] return 'mycolumn' not in columns def migrate_sqlite(): cursor = db.cursor() cursor.execute("ALTER TABLE mytable ADD COLUMN mycolumn INTEGER") db.commit() def migrate_postgres(): cursor = db.cursor() cursor.execute("ALTER TABLE mytable ADD COLUMN mycolumn BIGINT") db.commit() def run(): db.connect(reuse_if_open=True) try: # Check if any future migration has been applied (auto-skip if superseded) if should_skip_due_to_future_migrations(MIGRATION_NUMBER, db, cfg): print("Migration 00X: Skipped (superseded by future migration)") return True if not check_migration_needed(): print("Migration 00X: Already applied") return True print("Migration 00X: Running...") if cfg.app.db_backend == "postgres": migrate_postgres() else: migrate_sqlite() print("Migration 00X: ✓ Completed") return True except Exception as e: print(f"Migration 00X: ✗ Failed - {e}") import traceback traceback.print_exc() return False finally: db.close() if __name__ == "__main__": success = run() sys.exit(0 if success else 1) ``` ## Running Migrations **Automatic (in Docker):** - Migrations run automatically on container startup **Manual:** ```bash # Run all migrations python scripts/run_migrations.py # Run specific migration python scripts/db_migrations/001_repository_schema.py ``` ## Best Practices ### For Fresh Databases **Recommended:** Delete database data and let `init_db()` create the latest schema: ```bash docker-compose down rm -rf hub-meta/postgres-data/* docker-compose up -d ``` This is faster and cleaner than running all migrations sequentially. ### For Existing Databases 1. **Backup first!** Always backup before running migrations 2. Run migrations via the automatic startup process 3. Set `KOHAKU_HUB_AUTO_MIGRATE=true` to skip confirmation prompts in Docker 4. Monitor logs for migration status ### For Development ```bash # Run all pending migrations python scripts/run_migrations.py # Run specific migration python scripts/db_migrations/001_repository_schema.py ``` ## Migration System Design ### Self-Healing Future-Migration Detection Each migration automatically checks if any **future** migration has been applied before running: **How it works:** 1. Migration 003 is about to run 2. Checks migrations 008, 007, 006, 005, 004 (newest to oldest) 3. If migration 008's `is_applied()` returns True → skip migration 003 4. If all future migrations return False → run migration 003 normally **Benefits:** - No hardcoding of specific migration numbers - Automatically works when you add migration 009, 010, etc. - Errors/exceptions in `is_applied()` are treated as "not applied" (safe fallback) - Makes migrations resilient to major schema refactorings **Each migration must implement:** ```python def is_applied(db, cfg): """Check if THIS migration has been applied. Choose a unique signature (table or column) that this migration creates. Errors should return True (treat as applied to be safe). """ return check_column_exists(db, cfg, "mytable", "my_signature_column") ``` ### Important Guidelines #### DO NOT Import Peewee Models **Never import models like `User`, `Repository`, `Organization`, etc. in migrations!** ```python # ❌ BAD - Will break if model is renamed/deleted from kohakuhub.db import db, User, Organization db.create_tables([User], safe=True) # ✅ GOOD - Use raw SQL instead from kohakuhub.db import db cursor = db.cursor() cursor.execute("CREATE TABLE IF NOT EXISTS user (...)") ``` **Why?** - Migrations are permanent historical records - Models may be renamed, deleted, or refactored in future versions - Importing models creates tight coupling that breaks old migrations - Example: Migration 002 imported `Organization`, which no longer exists after migration 008 **Use raw SQL for all schema changes:** - Table creation: `CREATE TABLE IF NOT EXISTS` - Column addition: `ALTER TABLE ... ADD COLUMN` - Index creation: `CREATE INDEX IF NOT EXISTS` ## Notes - Migrations are idempotent - safe to re-run - Failed migrations will prevent server startup - Each migration auto-skips if ANY future migration has been applied - Use raw SQL queries instead of importing Peewee models - Errors in `is_applied()` are treated as "applied" (safe fallback) - Old migration scripts in `scripts/migrate_*.py` are kept for reference ## Utilities (`_migration_utils.py`) Common helper functions available to all migrations: ```python from _migration_utils import ( should_skip_due_to_future_migrations, # Check if future migrations applied check_table_exists, # Check if table exists check_column_exists, # Check if column exists ) # Usage in migration if should_skip_due_to_future_migrations(MIGRATION_NUMBER, db, cfg): print("Migration skipped - superseded by future migration") return True ``` ## Troubleshooting ### Error: "column repository_id does not exist" This error indicates the database is in an inconsistent state (mix of old and new schema). **Cause:** The database has some tables created with the new schema (post-migration 008), but migrations never ran to update the data properly. **Solution 1 (Recommended):** Drop database and start fresh: ```bash docker-compose down rm -rf hub-meta/postgres-data/* docker-compose up -d ``` **Solution 2:** Manually fix the inconsistency: 1. Connect to database: `docker exec -it postgres psql -U hub -d hubdb` 2. Check schema: `\d file` and `\d repository` 3. If File table has `repository_id` but no data, drop and recreate: ```sql DROP TABLE IF EXISTS file CASCADE; DROP TABLE IF EXISTS repository CASCADE; -- Restart container to recreate tables ``` **Prevention:** Always run migrations before application starts (handled automatically in Docker)