Files
cs249r_book/book/docs/DEVELOPMENT.md
Vijay Janapa Reddi 2390c3ab31 Refactor: consolidate Quarto config layers and content reorganization.
Unifies Quarto metadata into shared base/format/volume fragments while carrying through chapter path, asset, and tooling updates to keep the repository consistent and easier to maintain.
2026-02-12 15:38:55 -05:00

10 KiB

MLSysBook Development Guide

This guide covers the development workflow, automated cleanup system, and best practices for contributing to the Machine Learning Systems book.

🎯 Essential Commands (Daily Use)

./binder clean      # Clean build artifacts
./binder build      # Build HTML book
./binder doctor     # Health check & diagnostics
./binder preview    # Live preview with hot reload
./binder build pdf  # Build PDF

🚀 Quick Start

# First time setup
./binder setup              # Configure environment and tools

# Daily workflow (most common commands)
./binder clean              # Clean build artifacts
./binder build              # Build HTML (complete book)
./binder doctor             # Health check

# Preview & development
./binder preview intro      # Preview a chapter with live reload
./binder build intro        # Build specific chapter

🧹 Automated Cleanup System

This project includes an automated cleanup system that runs before every commit to ensure a clean repository.

What Gets Cleaned Automatically

The cleanup system removes:

  • Build artifacts: *.html, *.pdf, *.tex, *.aux, *.log, *.toc
  • Cache directories: .quarto/, site_libs/, index_files/ (legacy)
  • Python artifacts: __pycache__/, *.pyc, *.pyo
  • System files: .DS_Store, Thumbs.db, *.swp
  • Editor files: *~, .#*
  • Debug files: debug.log, error.log

Manual Cleanup Commands

# Regular cleanup (recommended before commits)
./binder clean

# See what files will be cleaned (safe preview)
git status
git clean -xdn

# Deep clean (removes all build artifacts)
./binder clean
git clean -xdf

Pre-Commit Hook

The git pre-commit hook automatically:

  1. 🔍 Scans for build artifacts in staged files
  2. 🧹 Runs cleanup if artifacts are detected
  3. ⚠️ Warns about large files (>1MB)
  4. 🚨 Blocks commits with potential secrets
  5. Allows clean commits to proceed

Bypassing the Hook (Emergency)

# Only if absolutely necessary
git commit --no-verify -m "Emergency commit"

🔨 Building the Book

Build Commands

# Using binder (recommended)
./binder build html            # Build HTML version
./binder build pdf             # Build PDF version
./binder publish               # Build and publish

# Using binder (recommended)
./binder build                 # HTML version
./binder build pdf             # PDF version
./binder build epub            # EPUB version

Development Workflow

# Preview a chapter (fastest)
./binder preview intro

# Build complete book
./binder build html

# Publish to the world
./binder publish

Environment Setup

The ./binder setup command provides a complete environment configuration:

What it does:

  1. Checks environment - Verifies all required tools and versions
  2. Installs dependencies - Auto-installs missing tools (Quarto, GitHub CLI, Ollama)
  3. Configures Git - Sets up user name, email, and GitHub username
  4. Sets preferences - Configures build format and browser behavior
  5. Tests setup - Builds a test chapter to verify everything works

Features:

  • 🛠️ Automatic tool installation (Homebrew, apt, pip)
  • 👤 Interactive Git configuration
  • ⚙️ User preference setup
  • 🧪 Built-in testing to verify setup
# Run setup
./binder setup

# Get welcome and overview
./binder hello

Development Server

# Start live preview server
./binder preview

# The server will automatically reload when you save changes

Build Outputs

  • HTML: build/html/index.html (main output directory)
  • PDF: build/pdf/ (PDF output directory)
  • PDF: book/index.pdf (in book directory)
  • Artifacts: Automatically cleaned by git hooks

🚀 Publishing

The ./binder publish command provides a complete publishing workflow:

Step-by-step process:

  1. Environment validation - Checks Git status, tools, and dependencies
  2. Branch management - Merges dev to main with confirmation
  3. Release planning - Suggests version bump based on changes
  4. Build process - PDF first, then HTML (ensures PDF availability)
  5. Release creation - Git tag, AI-generated release notes, GitHub release
  6. Deployment - Copies PDF to assets, commits, pushes to production

Features:

  • 🤖 AI-powered release notes (requires Ollama)
  • 📊 Smart version suggestions (patch/minor/major)
  • 🛡️ Safety checks and confirmations
  • 🎯 Step-by-step wizard with clear progress
# One-command publishing
./binder publish

Manual Publishing Steps

If you prefer to do it step by step:

# 1. Ensure you're on main branch
git checkout main
git merge dev

# 2. Build both formats
./binder build html
./binder build pdf

# 3. Copy PDF to assets
cp build/pdf/Machine-Learning-Systems.pdf assets/

# 4. Commit and push
git add assets/downloads/Machine-Learning-Systems.pdf
git commit -m "Add PDF to assets"
git push origin main

Publishing Requirements

  • Must be on main branch
  • No uncommitted changes
  • All builds successful
  • Git repository properly configured

After Publishing

The GitHub Actions workflow will:

  • 🔄 Run quality checks
  • 🏗️ Build all formats (Linux + Windows)
  • 🚀 Deploy to GitHub Pages
  • 📦 Create release assets

Monitor progress: https://github.com/harvard-edge/cs249r_book/actions

🔍 Project Health Checks

Quick Status Check

./binder doctor     # Overall project health
./binder status     # Detailed project status
git status          # Git repository status

Comprehensive Testing

./binder doctor     # Run comprehensive health check
quarto check        # Validate Quarto configuration

Example Health Check Output

🔍 Checking project health...

📊 Project Structure:
  QMD files: 45
  Bibliography files: 20
  Quiz files: 18

🗂️ Git Status:
  Repository is clean

📦 Dependencies:
  ✅ Quarto: 1.4.x
  ✅ Python: 3.x

📝 Content Development

Chapter Structure

book/contents/
├── core/                    # Main content chapters
│   ├── introduction/
│   │   ├── introduction.qmd
│   │   ├── introduction.bib
│   │   └── introduction_quizzes.json
│   └── ...
├── frontmatter/            # Preface, about, etc.
├── backmatter/             # References, appendices
└── labs/                   # Hands-on exercises

Working with Minimal Configuration

For faster development, you can work with a minimal set of chapters:

  1. Edit book/_quarto-html.yml: Comment out chapters you're not working on
  2. Edit bibliography section: Comment out unused .bib files
  3. Build faster: Only active chapters will be processed
chapters:
  - index.qmd
  - contents/core/introduction/introduction.qmd
  # - contents/core/ml_systems/ml_systems.qmd    # Commented out
  # - contents/core/nn_computation/nn_computation.qmd      # Commented out

Restoring Full Configuration

Simply uncomment the chapters and bibliography entries you want to restore.

🔧 Troubleshooting

Common Issues

  1. Build fails with missing files

    make clean          # Clean artifacts
    make check          # Verify structure
    
  2. Git hook blocks commit

    make clean          # Remove artifacts
    git status          # Check what's staged
    
  3. Slow builds

    make clean-deep     # Full cleanup
    # Use minimal configuration
    
  4. Permission denied on scripts

    make setup-hooks    # Fix permissions
    

Getting Help

./binder help       # Show all commands
./binder --help     # Detailed help

🎯 Best Practices

Before Starting Work

git pull            # Get latest changes
./binder clean      # Clean workspace
./binder doctor     # Verify health

Daily Development Workflow

# 1. Clean and build
./binder clean
./binder build

# 2. Start development server
./binder preview

# 3. Make changes to .qmd files
# 4. Preview updates automatically

# 5. When ready to commit
git add .           # Pre-commit hook runs automatically
git commit -m "Your message"

Before Major Changes

./binder clean      # Full cleanup
./binder build      # Clean build
./binder doctor     # Run all checks

Release Preparation

./binder doctor     # Comprehensive validation
./binder build      # Build HTML
./binder build pdf  # Build PDF
./binder build epub # Build EPUB

⚙️ Configuration Files

  • quarto/config/_quarto-html.yml: HTML website configuration
  • quarto/config/_quarto-pdf.yml: PDF book configuration
  • binder: Book Binder CLI (build and development tool)
  • .git/hooks/pre-commit: Automated cleanup hook
  • .gitignore: Ignored file patterns

🗂️ Scripts Organization

The tools/scripts/ directory is organized into logical categories:

tools/scripts/
├── build/           # Build and development scripts (clean.sh, etc.)
├── content/         # Content management tools
├── maintenance/     # System maintenance scripts
├── testing/         # Test and validation scripts
├── utilities/       # General utility scripts
├── docs/            # Script documentation
├── genai/           # AI and generation tools
├── cross_refs/      # Cross-reference management
├── quarto_publish/  # Publishing workflows
└── ai_menu/         # AI menu tools

Each directory has its own README.md with specific usage instructions.

🤝 Contributing

  1. Fork and clone the repository
  2. Run setup: make setup-hooks && make install
  3. Make changes with the development workflow above
  4. Test thoroughly: make test && make build-all
  5. Submit pull request with clean commits

The automated cleanup system ensures that your commits will be clean and won't include build artifacts, making code reviews easier and keeping the repository tidy.

📞 Support

If you encounter issues with the development workflow:

  1. Check this guide first
  2. Run make check for diagnostics
  3. Review the cleanup script output with make clean-dry
  4. Ask for help in project discussions