Unifies Quarto metadata into shared base/format/volume fragments while carrying through chapter path, asset, and tooling updates to keep the repository consistent and easier to maintain.
10 KiB
MLSysBook Development Guide
This guide covers the development workflow, automated cleanup system, and best practices for contributing to the Machine Learning Systems book.
🎯 Essential Commands (Daily Use)
./binder clean # Clean build artifacts
./binder build # Build HTML book
./binder doctor # Health check & diagnostics
./binder preview # Live preview with hot reload
./binder build pdf # Build PDF
🚀 Quick Start
# First time setup
./binder setup # Configure environment and tools
# Daily workflow (most common commands)
./binder clean # Clean build artifacts
./binder build # Build HTML (complete book)
./binder doctor # Health check
# Preview & development
./binder preview intro # Preview a chapter with live reload
./binder build intro # Build specific chapter
🧹 Automated Cleanup System
This project includes an automated cleanup system that runs before every commit to ensure a clean repository.
What Gets Cleaned Automatically
The cleanup system removes:
- Build artifacts:
*.html,*.pdf,*.tex,*.aux,*.log,*.toc - Cache directories:
.quarto/,site_libs/,index_files/(legacy) - Python artifacts:
__pycache__/,*.pyc,*.pyo - System files:
.DS_Store,Thumbs.db,*.swp - Editor files:
*~,.#* - Debug files:
debug.log,error.log
Manual Cleanup Commands
# Regular cleanup (recommended before commits)
./binder clean
# See what files will be cleaned (safe preview)
git status
git clean -xdn
# Deep clean (removes all build artifacts)
./binder clean
git clean -xdf
Pre-Commit Hook
The git pre-commit hook automatically:
- 🔍 Scans for build artifacts in staged files
- 🧹 Runs cleanup if artifacts are detected
- ⚠️ Warns about large files (>1MB)
- 🚨 Blocks commits with potential secrets
- ✅ Allows clean commits to proceed
Bypassing the Hook (Emergency)
# Only if absolutely necessary
git commit --no-verify -m "Emergency commit"
🔨 Building the Book
Build Commands
# Using binder (recommended)
./binder build html # Build HTML version
./binder build pdf # Build PDF version
./binder publish # Build and publish
# Using binder (recommended)
./binder build # HTML version
./binder build pdf # PDF version
./binder build epub # EPUB version
Development Workflow
# Preview a chapter (fastest)
./binder preview intro
# Build complete book
./binder build html
# Publish to the world
./binder publish
Environment Setup
The ./binder setup command provides a complete environment configuration:
What it does:
- Checks environment - Verifies all required tools and versions
- Installs dependencies - Auto-installs missing tools (Quarto, GitHub CLI, Ollama)
- Configures Git - Sets up user name, email, and GitHub username
- Sets preferences - Configures build format and browser behavior
- Tests setup - Builds a test chapter to verify everything works
Features:
- 🛠️ Automatic tool installation (Homebrew, apt, pip)
- 👤 Interactive Git configuration
- ⚙️ User preference setup
- 🧪 Built-in testing to verify setup
# Run setup
./binder setup
# Get welcome and overview
./binder hello
Development Server
# Start live preview server
./binder preview
# The server will automatically reload when you save changes
Build Outputs
- HTML:
build/html/index.html(main output directory) - PDF:
build/pdf/(PDF output directory) - PDF:
book/index.pdf(in book directory) - Artifacts: Automatically cleaned by git hooks
🚀 Publishing
The ./binder publish command provides a complete publishing workflow:
Step-by-step process:
- Environment validation - Checks Git status, tools, and dependencies
- Branch management - Merges
devtomainwith confirmation - Release planning - Suggests version bump based on changes
- Build process - PDF first, then HTML (ensures PDF availability)
- Release creation - Git tag, AI-generated release notes, GitHub release
- Deployment - Copies PDF to assets, commits, pushes to production
Features:
- 🤖 AI-powered release notes (requires Ollama)
- 📊 Smart version suggestions (patch/minor/major)
- 🛡️ Safety checks and confirmations
- 🎯 Step-by-step wizard with clear progress
# One-command publishing
./binder publish
Manual Publishing Steps
If you prefer to do it step by step:
# 1. Ensure you're on main branch
git checkout main
git merge dev
# 2. Build both formats
./binder build html
./binder build pdf
# 3. Copy PDF to assets
cp build/pdf/Machine-Learning-Systems.pdf assets/
# 4. Commit and push
git add assets/downloads/Machine-Learning-Systems.pdf
git commit -m "Add PDF to assets"
git push origin main
Publishing Requirements
- ✅ Must be on
mainbranch - ✅ No uncommitted changes
- ✅ All builds successful
- ✅ Git repository properly configured
After Publishing
The GitHub Actions workflow will:
- 🔄 Run quality checks
- 🏗️ Build all formats (Linux + Windows)
- 🚀 Deploy to GitHub Pages
- 📦 Create release assets
Monitor progress: https://github.com/harvard-edge/cs249r_book/actions
🔍 Project Health Checks
Quick Status Check
./binder doctor # Overall project health
./binder status # Detailed project status
git status # Git repository status
Comprehensive Testing
./binder doctor # Run comprehensive health check
quarto check # Validate Quarto configuration
Example Health Check Output
🔍 Checking project health...
📊 Project Structure:
QMD files: 45
Bibliography files: 20
Quiz files: 18
🗂️ Git Status:
Repository is clean
📦 Dependencies:
✅ Quarto: 1.4.x
✅ Python: 3.x
📝 Content Development
Chapter Structure
book/contents/
├── core/ # Main content chapters
│ ├── introduction/
│ │ ├── introduction.qmd
│ │ ├── introduction.bib
│ │ └── introduction_quizzes.json
│ └── ...
├── frontmatter/ # Preface, about, etc.
├── backmatter/ # References, appendices
└── labs/ # Hands-on exercises
Working with Minimal Configuration
For faster development, you can work with a minimal set of chapters:
- Edit
book/_quarto-html.yml: Comment out chapters you're not working on - Edit bibliography section: Comment out unused
.bibfiles - Build faster: Only active chapters will be processed
chapters:
- index.qmd
- contents/core/introduction/introduction.qmd
# - contents/core/ml_systems/ml_systems.qmd # Commented out
# - contents/core/nn_computation/nn_computation.qmd # Commented out
Restoring Full Configuration
Simply uncomment the chapters and bibliography entries you want to restore.
🔧 Troubleshooting
Common Issues
-
Build fails with missing files
make clean # Clean artifacts make check # Verify structure -
Git hook blocks commit
make clean # Remove artifacts git status # Check what's staged -
Slow builds
make clean-deep # Full cleanup # Use minimal configuration -
Permission denied on scripts
make setup-hooks # Fix permissions
Getting Help
./binder help # Show all commands
./binder --help # Detailed help
🎯 Best Practices
Before Starting Work
git pull # Get latest changes
./binder clean # Clean workspace
./binder doctor # Verify health
Daily Development Workflow
# 1. Clean and build
./binder clean
./binder build
# 2. Start development server
./binder preview
# 3. Make changes to .qmd files
# 4. Preview updates automatically
# 5. When ready to commit
git add . # Pre-commit hook runs automatically
git commit -m "Your message"
Before Major Changes
./binder clean # Full cleanup
./binder build # Clean build
./binder doctor # Run all checks
Release Preparation
./binder doctor # Comprehensive validation
./binder build # Build HTML
./binder build pdf # Build PDF
./binder build epub # Build EPUB
⚙️ Configuration Files
quarto/config/_quarto-html.yml: HTML website configurationquarto/config/_quarto-pdf.yml: PDF book configurationbinder: Book Binder CLI (build and development tool).git/hooks/pre-commit: Automated cleanup hook.gitignore: Ignored file patterns
🗂️ Scripts Organization
The tools/scripts/ directory is organized into logical categories:
tools/scripts/
├── build/ # Build and development scripts (clean.sh, etc.)
├── content/ # Content management tools
├── maintenance/ # System maintenance scripts
├── testing/ # Test and validation scripts
├── utilities/ # General utility scripts
├── docs/ # Script documentation
├── genai/ # AI and generation tools
├── cross_refs/ # Cross-reference management
├── quarto_publish/ # Publishing workflows
└── ai_menu/ # AI menu tools
Each directory has its own README.md with specific usage instructions.
🤝 Contributing
- Fork and clone the repository
- Run setup:
make setup-hooks && make install - Make changes with the development workflow above
- Test thoroughly:
make test && make build-all - Submit pull request with clean commits
The automated cleanup system ensures that your commits will be clean and won't include build artifacts, making code reviews easier and keeping the repository tidy.
📞 Support
If you encounter issues with the development workflow:
- Check this guide first
- Run
make checkfor diagnostics - Review the cleanup script output with
make clean-dry - Ask for help in project discussions