# MLSysBook Development Guide This guide covers the development workflow, automated cleanup system, and best practices for contributing to the Machine Learning Systems book. ## ๐ŸŽฏ Essential Commands (Daily Use) ```bash ./binder clean # Clean build artifacts ./binder build # Build HTML book ./binder doctor # Health check & diagnostics ./binder preview # Live preview with hot reload ./binder build pdf # Build PDF ``` ## ๐Ÿš€ Quick Start ```bash # First time setup ./binder setup # Configure environment and tools # Daily workflow (most common commands) ./binder clean # Clean build artifacts ./binder build # Build HTML (complete book) ./binder doctor # Health check # Preview & development ./binder preview intro # Preview a chapter with live reload ./binder build intro # Build specific chapter ``` ## ๐Ÿงน Automated Cleanup System This project includes an **automated cleanup system** that runs before every commit to ensure a clean repository. ### What Gets Cleaned Automatically The cleanup system removes: - **Build artifacts**: `*.html`, `*.pdf`, `*.tex`, `*.aux`, `*.log`, `*.toc` - **Cache directories**: `.quarto/`, `site_libs/`, `index_files/` (legacy) - **Python artifacts**: `__pycache__/`, `*.pyc`, `*.pyo` - **System files**: `.DS_Store`, `Thumbs.db`, `*.swp` - **Editor files**: `*~`, `.#*` - **Debug files**: `debug.log`, `error.log` ### Manual Cleanup Commands ```bash # Regular cleanup (recommended before commits) ./binder clean # See what files will be cleaned (safe preview) git status git clean -xdn # Deep clean (removes all build artifacts) ./binder clean git clean -xdf ``` ### Pre-Commit Hook The git pre-commit hook automatically: 1. ๐Ÿ” **Scans for build artifacts** in staged files 2. ๐Ÿงน **Runs cleanup** if artifacts are detected 3. โš ๏ธ **Warns about large files** (>1MB) 4. ๐Ÿšจ **Blocks commits** with potential secrets 5. โœ… **Allows clean commits** to proceed #### Bypassing the Hook (Emergency) ```bash # Only if absolutely necessary git commit --no-verify -m "Emergency commit" ``` ## ๐Ÿ”จ Building the Book ### Build Commands ```bash # Using binder (recommended) ./binder build html # Build HTML version ./binder build pdf # Build PDF version ./binder publish # Build and publish # Using binder (recommended) ./binder build # HTML version ./binder build pdf # PDF version ./binder build epub # EPUB version ``` ### Development Workflow ```bash # Preview a chapter (fastest) ./binder preview intro # Build complete book ./binder build html # Publish to the world ./binder publish ``` ### Environment Setup The `./binder setup` command provides a complete environment configuration: **What it does:** 1. **Checks environment** - Verifies all required tools and versions 2. **Installs dependencies** - Auto-installs missing tools (Quarto, GitHub CLI, Ollama) 3. **Configures Git** - Sets up user name, email, and GitHub username 4. **Sets preferences** - Configures build format and browser behavior 5. **Tests setup** - Builds a test chapter to verify everything works **Features:** - ๐Ÿ› ๏ธ **Automatic tool installation** (Homebrew, apt, pip) - ๐Ÿ‘ค **Interactive Git configuration** - โš™๏ธ **User preference setup** - ๐Ÿงช **Built-in testing** to verify setup ```bash # Run setup ./binder setup # Get welcome and overview ./binder hello ``` ### Development Server ```bash # Start live preview server ./binder preview # The server will automatically reload when you save changes ``` ### Build Outputs - **HTML**: `build/html/index.html` (main output directory) - **PDF**: `build/pdf/` (PDF output directory) - **PDF**: `book/index.pdf` (in book directory) - **Artifacts**: Automatically cleaned by git hooks ## ๐Ÿš€ Publishing The `./binder publish` command provides a complete publishing workflow: **Step-by-step process:** 1. **Environment validation** - Checks Git status, tools, and dependencies 2. **Branch management** - Merges `dev` to `main` with confirmation 3. **Release planning** - Suggests version bump based on changes 4. **Build process** - PDF first, then HTML (ensures PDF availability) 5. **Release creation** - Git tag, AI-generated release notes, GitHub release 6. **Deployment** - Copies PDF to assets, commits, pushes to production **Features:** - ๐Ÿค– **AI-powered release notes** (requires Ollama) - ๐Ÿ“Š **Smart version suggestions** (patch/minor/major) - ๐Ÿ›ก๏ธ **Safety checks** and confirmations - ๐ŸŽฏ **Step-by-step wizard** with clear progress ```bash # One-command publishing ./binder publish ``` ### Manual Publishing Steps If you prefer to do it step by step: ```bash # 1. Ensure you're on main branch git checkout main git merge dev # 2. Build both formats ./binder build html ./binder build pdf # 3. Copy PDF to assets cp build/pdf/Machine-Learning-Systems.pdf assets/ # 4. Commit and push git add assets/downloads/Machine-Learning-Systems.pdf git commit -m "Add PDF to assets" git push origin main ``` ### Publishing Requirements - โœ… Must be on `main` branch - โœ… No uncommitted changes - โœ… All builds successful - โœ… Git repository properly configured ### After Publishing The GitHub Actions workflow will: - ๐Ÿ”„ Run quality checks - ๐Ÿ—๏ธ Build all formats (Linux + Windows) - ๐Ÿš€ Deploy to GitHub Pages - ๐Ÿ“ฆ Create release assets **Monitor progress**: https://github.com/harvard-edge/cs249r_book/actions ## ๐Ÿ” Project Health Checks ### Quick Status Check ```bash ./binder doctor # Overall project health ./binder status # Detailed project status git status # Git repository status ``` ### Comprehensive Testing ```bash ./binder doctor # Run comprehensive health check quarto check # Validate Quarto configuration ``` ### Example Health Check Output ``` ๐Ÿ” Checking project health... ๐Ÿ“Š Project Structure: QMD files: 45 Bibliography files: 20 Quiz files: 18 ๐Ÿ—‚๏ธ Git Status: Repository is clean ๐Ÿ“ฆ Dependencies: โœ… Quarto: 1.4.x โœ… Python: 3.x ``` ## ๐Ÿ“ Content Development ### Chapter Structure ``` book/contents/ โ”œโ”€โ”€ core/ # Main content chapters โ”‚ โ”œโ”€โ”€ introduction/ โ”‚ โ”‚ โ”œโ”€โ”€ introduction.qmd โ”‚ โ”‚ โ”œโ”€โ”€ introduction.bib โ”‚ โ”‚ โ””โ”€โ”€ introduction_quizzes.json โ”‚ โ””โ”€โ”€ ... โ”œโ”€โ”€ frontmatter/ # Preface, about, etc. โ”œโ”€โ”€ backmatter/ # References, appendices โ””โ”€โ”€ labs/ # Hands-on exercises ``` ### Working with Minimal Configuration For faster development, you can work with a minimal set of chapters: 1. **Edit `book/_quarto-html.yml`**: Comment out chapters you're not working on 2. **Edit bibliography section**: Comment out unused `.bib` files 3. **Build faster**: Only active chapters will be processed ```yaml chapters: - index.qmd - contents/core/introduction/introduction.qmd # - contents/core/ml_systems/ml_systems.qmd # Commented out # - contents/core/nn_computation/nn_computation.qmd # Commented out ``` ### Restoring Full Configuration Simply uncomment the chapters and bibliography entries you want to restore. ## ๐Ÿ”ง Troubleshooting ### Common Issues 1. **Build fails with missing files** ```bash make clean # Clean artifacts make check # Verify structure ``` 2. **Git hook blocks commit** ```bash make clean # Remove artifacts git status # Check what's staged ``` 3. **Slow builds** ```bash make clean-deep # Full cleanup # Use minimal configuration ``` 4. **Permission denied on scripts** ```bash make setup-hooks # Fix permissions ``` ### Getting Help ```bash ./binder help # Show all commands ./binder --help # Detailed help ``` ## ๐ŸŽฏ Best Practices ### Before Starting Work ```bash git pull # Get latest changes ./binder clean # Clean workspace ./binder doctor # Verify health ``` ### Daily Development Workflow ```bash # 1. Clean and build ./binder clean ./binder build # 2. Start development server ./binder preview # 3. Make changes to .qmd files # 4. Preview updates automatically # 5. When ready to commit git add . # Pre-commit hook runs automatically git commit -m "Your message" ``` ### Before Major Changes ```bash ./binder clean # Full cleanup ./binder build # Clean build ./binder doctor # Run all checks ``` ### Release Preparation ```bash ./binder doctor # Comprehensive validation ./binder build # Build HTML ./binder build pdf # Build PDF ./binder build epub # Build EPUB ``` ## โš™๏ธ Configuration Files - **`quarto/config/_quarto-html.yml`**: HTML website configuration - **`quarto/config/_quarto-pdf.yml`**: PDF book configuration - **`binder`**: Book Binder CLI (build and development tool) - **`.git/hooks/pre-commit`**: Automated cleanup hook - **`.gitignore`**: Ignored file patterns ## ๐Ÿ—‚๏ธ Scripts Organization The `tools/scripts/` directory is organized into logical categories: ``` tools/scripts/ โ”œโ”€โ”€ build/ # Build and development scripts (clean.sh, etc.) โ”œโ”€โ”€ content/ # Content management tools โ”œโ”€โ”€ maintenance/ # System maintenance scripts โ”œโ”€โ”€ testing/ # Test and validation scripts โ”œโ”€โ”€ utilities/ # General utility scripts โ”œโ”€โ”€ docs/ # Script documentation โ”œโ”€โ”€ genai/ # AI and generation tools โ”œโ”€โ”€ cross_refs/ # Cross-reference management โ”œโ”€โ”€ quarto_publish/ # Publishing workflows โ””โ”€โ”€ ai_menu/ # AI menu tools ``` Each directory has its own README.md with specific usage instructions. ## ๐Ÿค Contributing 1. **Fork and clone** the repository 2. **Run setup**: `make setup-hooks && make install` 3. **Make changes** with the development workflow above 4. **Test thoroughly**: `make test && make build-all` 5. **Submit pull request** with clean commits The automated cleanup system ensures that your commits will be clean and won't include build artifacts, making code reviews easier and keeping the repository tidy. ## ๐Ÿ“ž Support If you encounter issues with the development workflow: 1. Check this guide first 2. Run `make check` for diagnostics 3. Review the cleanup script output with `make clean-dry` 4. Ask for help in project discussions