Updates documentation to reflect the new location of the PDF file within the 'assets/downloads' subdirectory. This change ensures consistent access and organization of downloadable resources.
10 KiB
MLSysBook Development Guide
This guide covers the development workflow, automated cleanup system, and best practices for contributing to the Machine Learning Systems book.
🚀 Quick Start
# First time setup
./binder setup # Configure environment and tools
./binder hello # Welcome and overview
# Daily development
./binder preview intro # Preview a chapter
./binder build - html # Build complete book
./binder publish # Publish to the world
🧹 Automated Cleanup System
This project includes an automated cleanup system that runs before every commit to ensure a clean repository.
What Gets Cleaned Automatically
The cleanup system removes:
- Build artifacts:
*.html,*.pdf,*.tex,*.aux,*.log,*.toc - Cache directories:
.quarto/,site_libs/,index_files/(legacy) - Python artifacts:
__pycache__/,*.pyc,*.pyo - System files:
.DS_Store,Thumbs.db,*.swp - Editor files:
*~,.#* - Debug files:
debug.log,error.log
Manual Cleanup Commands
# Regular cleanup (recommended before commits)
make clean
./tools/scripts/build/clean.sh
# See what would be cleaned (safe preview)
make clean-dry
./tools/scripts/build/clean.sh --dry-run
# Deep clean (removes caches, virtual environments)
make clean-deep
./tools/scripts/build/clean.sh --deep
# Quiet cleanup (minimal output)
./tools/scripts/build/clean.sh --quiet
Pre-Commit Hook
The git pre-commit hook automatically:
- 🔍 Scans for build artifacts in staged files
- 🧹 Runs cleanup if artifacts are detected
- ⚠️ Warns about large files (>1MB)
- 🚨 Blocks commits with potential secrets
- ✅ Allows clean commits to proceed
Bypassing the Hook (Emergency)
# Only if absolutely necessary
git commit --no-verify -m "Emergency commit"
🔨 Building the Book
Build Commands
# Using binder (recommended)
./binder build - html # Build HTML version
./binder build - pdf # Build PDF version
./binder publish # Build and publish
# Using make (legacy)
make build # HTML version
make build-pdf # PDF version
make build-all # All formats
Development Workflow
# Preview a chapter (fastest)
./binder preview intro
# Build complete book
./binder build - html
# Publish to the world
./binder publish
Environment Setup
The ./binder setup command provides a complete environment configuration:
What it does:
- Checks environment - Verifies all required tools and versions
- Installs dependencies - Auto-installs missing tools (Quarto, GitHub CLI, Ollama)
- Configures Git - Sets up user name, email, and GitHub username
- Sets preferences - Configures build format and browser behavior
- Tests setup - Builds a test chapter to verify everything works
Features:
- 🛠️ Automatic tool installation (Homebrew, apt, pip)
- 👤 Interactive Git configuration
- ⚙️ User preference setup
- 🧪 Built-in testing to verify setup
# Run setup
./binder setup
# Get welcome and overview
./binder hello
Development Server
# Start live preview server
make preview
cd book && quarto preview
# The server will automatically reload when you save changes
Build Outputs
- HTML:
build/html/index.html(main output directory) - PDF:
build/pdf/(PDF output directory) - PDF:
book/index.pdf(in book directory) - Artifacts: Automatically cleaned by git hooks
🚀 Publishing
The ./binder publish command provides a complete publishing workflow:
Step-by-step process:
- Environment validation - Checks Git status, tools, and dependencies
- Branch management - Merges
devtomainwith confirmation - Release planning - Suggests version bump based on changes
- Build process - PDF first, then HTML (ensures PDF availability)
- Release creation - Git tag, AI-generated release notes, GitHub release
- Deployment - Copies PDF to assets, commits, pushes to production
Features:
- 🤖 AI-powered release notes (requires Ollama)
- 📊 Smart version suggestions (patch/minor/major)
- 🛡️ Safety checks and confirmations
- 🎯 Step-by-step wizard with clear progress
# One-command publishing
./binder publish
Manual Publishing Steps
If you prefer to do it step by step:
# 1. Ensure you're on main branch
git checkout main
git merge dev
# 2. Build both formats
./binder build - html
./binder build - pdf
# 3. Copy PDF to assets
cp build/pdf/Machine-Learning-Systems.pdf assets/
# 4. Commit and push
git add assets/downloads/Machine-Learning-Systems.pdf
git commit -m "Add PDF to assets"
git push origin main
Publishing Requirements
- ✅ Must be on
mainbranch - ✅ No uncommitted changes
- ✅ All builds successful
- ✅ Git repository properly configured
After Publishing
The GitHub Actions workflow will:
- 🔄 Run quality checks
- 🏗️ Build all formats (Linux + Windows)
- 🚀 Deploy to GitHub Pages
- 📦 Create release assets
Monitor progress: https://github.com/harvard-edge/cs249r_book/actions
🔍 Project Health Checks
Quick Status Check
make check # Overall project health
make status # Detailed project status
git status # Git repository status
Comprehensive Testing
make test # Run validation tests
make lint # Check for common issues
quarto check # Validate Quarto configuration
Example Health Check Output
🔍 Checking project health...
📊 Project Structure:
QMD files: 45
Bibliography files: 20
Quiz files: 18
🗂️ Git Status:
Repository is clean
📦 Dependencies:
✅ Quarto: 1.4.x
✅ Python: 3.x
📝 Content Development
Chapter Structure
book/contents/
├── core/ # Main content chapters
│ ├── introduction/
│ │ ├── introduction.qmd
│ │ ├── introduction.bib
│ │ └── introduction_quizzes.json
│ └── ...
├── frontmatter/ # Preface, about, etc.
├── backmatter/ # References, appendices
└── labs/ # Hands-on exercises
Working with Minimal Configuration
For faster development, you can work with a minimal set of chapters:
- Edit
book/_quarto-html.yml: Comment out chapters you're not working on - Edit bibliography section: Comment out unused
.bibfiles - Build faster: Only active chapters will be processed
chapters:
- index.qmd
- contents/core/introduction/introduction.qmd
# - contents/core/ml_systems/ml_systems.qmd # Commented out
# - contents/core/dl_primer/dl_primer.qmd # Commented out
Restoring Full Configuration
Simply uncomment the chapters and bibliography entries you want to restore.
🔧 Troubleshooting
Common Issues
-
Build fails with missing files
make clean # Clean artifacts make check # Verify structure -
Git hook blocks commit
make clean # Remove artifacts git status # Check what's staged -
Slow builds
make clean-deep # Full cleanup # Use minimal configuration -
Permission denied on scripts
make setup-hooks # Fix permissions
Getting Help
make help # Show all commands
make help-clean # Detailed cleanup help
make help-build # Detailed build help
🎯 Best Practices
Before Starting Work
git pull # Get latest changes
make clean # Clean workspace
make check # Verify health
Daily Development Workflow
# 1. Clean and build
make clean build
# 2. Start development server
make preview
# 3. Make changes to .qmd files
# 4. Preview updates automatically
# 5. When ready to commit
git add . # Pre-commit hook runs automatically
git commit -m "Your message"
Before Major Changes
make clean-deep # Full cleanup
make full-clean-build # Clean build from scratch
make test # Run all tests
Release Preparation
make release-check # Comprehensive validation
make build-all # Build all formats
make check # Final health check
⚙️ Configuration Files
book/_quarto-html.yml: HTML website configurationbook/_quarto-pdf.yml: PDF book configurationMakefile: Development commandstools/scripts/build/clean.sh: Cleanup script.git/hooks/pre-commit: Automated cleanup hook.gitignore: Ignored file patterns
🗂️ Scripts Organization
The tools/scripts/ directory is organized into logical categories:
tools/scripts/
├── build/ # Build and development scripts (clean.sh, etc.)
├── content/ # Content management tools
├── maintenance/ # System maintenance scripts
├── testing/ # Test and validation scripts
├── utilities/ # General utility scripts
├── docs/ # Script documentation
├── genai/ # AI and generation tools
├── cross_refs/ # Cross-reference management
├── quarto_publish/ # Publishing workflows
└── ai_menu/ # AI menu tools
Each directory has its own README.md with specific usage instructions.
🤝 Contributing
- Fork and clone the repository
- Run setup:
make setup-hooks && make install - Make changes with the development workflow above
- Test thoroughly:
make test && make build-all - Submit pull request with clean commits
The automated cleanup system ensures that your commits will be clean and won't include build artifacts, making code reviews easier and keeping the repository tidy.
📞 Support
If you encounter issues with the development workflow:
- Check this guide first
- Run
make checkfor diagnostics - Review the cleanup script output with
make clean-dry - Ask for help in project discussions