Files
cs249r_book/book/docs/DEVELOPMENT.md
Vijay Janapa Reddi 2390c3ab31 Refactor: consolidate Quarto config layers and content reorganization.
Unifies Quarto metadata into shared base/format/volume fragments while carrying through chapter path, asset, and tooling updates to keep the repository consistent and easier to maintain.
2026-02-12 15:38:55 -05:00

404 lines
10 KiB
Markdown

# MLSysBook Development Guide
This guide covers the development workflow, automated cleanup system, and best practices for contributing to the Machine Learning Systems book.
## 🎯 Essential Commands (Daily Use)
```bash
./binder clean # Clean build artifacts
./binder build # Build HTML book
./binder doctor # Health check & diagnostics
./binder preview # Live preview with hot reload
./binder build pdf # Build PDF
```
## 🚀 Quick Start
```bash
# First time setup
./binder setup # Configure environment and tools
# Daily workflow (most common commands)
./binder clean # Clean build artifacts
./binder build # Build HTML (complete book)
./binder doctor # Health check
# Preview & development
./binder preview intro # Preview a chapter with live reload
./binder build intro # Build specific chapter
```
## 🧹 Automated Cleanup System
This project includes an **automated cleanup system** that runs before every commit to ensure a clean repository.
### What Gets Cleaned Automatically
The cleanup system removes:
- **Build artifacts**: `*.html`, `*.pdf`, `*.tex`, `*.aux`, `*.log`, `*.toc`
- **Cache directories**: `.quarto/`, `site_libs/`, `index_files/` (legacy)
- **Python artifacts**: `__pycache__/`, `*.pyc`, `*.pyo`
- **System files**: `.DS_Store`, `Thumbs.db`, `*.swp`
- **Editor files**: `*~`, `.#*`
- **Debug files**: `debug.log`, `error.log`
### Manual Cleanup Commands
```bash
# Regular cleanup (recommended before commits)
./binder clean
# See what files will be cleaned (safe preview)
git status
git clean -xdn
# Deep clean (removes all build artifacts)
./binder clean
git clean -xdf
```
### Pre-Commit Hook
The git pre-commit hook automatically:
1. 🔍 **Scans for build artifacts** in staged files
2. 🧹 **Runs cleanup** if artifacts are detected
3. ⚠️ **Warns about large files** (>1MB)
4. 🚨 **Blocks commits** with potential secrets
5.**Allows clean commits** to proceed
#### Bypassing the Hook (Emergency)
```bash
# Only if absolutely necessary
git commit --no-verify -m "Emergency commit"
```
## 🔨 Building the Book
### Build Commands
```bash
# Using binder (recommended)
./binder build html # Build HTML version
./binder build pdf # Build PDF version
./binder publish # Build and publish
# Using binder (recommended)
./binder build # HTML version
./binder build pdf # PDF version
./binder build epub # EPUB version
```
### Development Workflow
```bash
# Preview a chapter (fastest)
./binder preview intro
# Build complete book
./binder build html
# Publish to the world
./binder publish
```
### Environment Setup
The `./binder setup` command provides a complete environment configuration:
**What it does:**
1. **Checks environment** - Verifies all required tools and versions
2. **Installs dependencies** - Auto-installs missing tools (Quarto, GitHub CLI, Ollama)
3. **Configures Git** - Sets up user name, email, and GitHub username
4. **Sets preferences** - Configures build format and browser behavior
5. **Tests setup** - Builds a test chapter to verify everything works
**Features:**
- 🛠️ **Automatic tool installation** (Homebrew, apt, pip)
- 👤 **Interactive Git configuration**
- ⚙️ **User preference setup**
- 🧪 **Built-in testing** to verify setup
```bash
# Run setup
./binder setup
# Get welcome and overview
./binder hello
```
### Development Server
```bash
# Start live preview server
./binder preview
# The server will automatically reload when you save changes
```
### Build Outputs
- **HTML**: `build/html/index.html` (main output directory)
- **PDF**: `build/pdf/` (PDF output directory)
- **PDF**: `book/index.pdf` (in book directory)
- **Artifacts**: Automatically cleaned by git hooks
## 🚀 Publishing
The `./binder publish` command provides a complete publishing workflow:
**Step-by-step process:**
1. **Environment validation** - Checks Git status, tools, and dependencies
2. **Branch management** - Merges `dev` to `main` with confirmation
3. **Release planning** - Suggests version bump based on changes
4. **Build process** - PDF first, then HTML (ensures PDF availability)
5. **Release creation** - Git tag, AI-generated release notes, GitHub release
6. **Deployment** - Copies PDF to assets, commits, pushes to production
**Features:**
- 🤖 **AI-powered release notes** (requires Ollama)
- 📊 **Smart version suggestions** (patch/minor/major)
- 🛡️ **Safety checks** and confirmations
- 🎯 **Step-by-step wizard** with clear progress
```bash
# One-command publishing
./binder publish
```
### Manual Publishing Steps
If you prefer to do it step by step:
```bash
# 1. Ensure you're on main branch
git checkout main
git merge dev
# 2. Build both formats
./binder build html
./binder build pdf
# 3. Copy PDF to assets
cp build/pdf/Machine-Learning-Systems.pdf assets/
# 4. Commit and push
git add assets/downloads/Machine-Learning-Systems.pdf
git commit -m "Add PDF to assets"
git push origin main
```
### Publishing Requirements
- ✅ Must be on `main` branch
- ✅ No uncommitted changes
- ✅ All builds successful
- ✅ Git repository properly configured
### After Publishing
The GitHub Actions workflow will:
- 🔄 Run quality checks
- 🏗️ Build all formats (Linux + Windows)
- 🚀 Deploy to GitHub Pages
- 📦 Create release assets
**Monitor progress**: https://github.com/harvard-edge/cs249r_book/actions
## 🔍 Project Health Checks
### Quick Status Check
```bash
./binder doctor # Overall project health
./binder status # Detailed project status
git status # Git repository status
```
### Comprehensive Testing
```bash
./binder doctor # Run comprehensive health check
quarto check # Validate Quarto configuration
```
### Example Health Check Output
```
🔍 Checking project health...
📊 Project Structure:
QMD files: 45
Bibliography files: 20
Quiz files: 18
🗂️ Git Status:
Repository is clean
📦 Dependencies:
✅ Quarto: 1.4.x
✅ Python: 3.x
```
## 📝 Content Development
### Chapter Structure
```
book/contents/
├── core/ # Main content chapters
│ ├── introduction/
│ │ ├── introduction.qmd
│ │ ├── introduction.bib
│ │ └── introduction_quizzes.json
│ └── ...
├── frontmatter/ # Preface, about, etc.
├── backmatter/ # References, appendices
└── labs/ # Hands-on exercises
```
### Working with Minimal Configuration
For faster development, you can work with a minimal set of chapters:
1. **Edit `book/_quarto-html.yml`**: Comment out chapters you're not working on
2. **Edit bibliography section**: Comment out unused `.bib` files
3. **Build faster**: Only active chapters will be processed
```yaml
chapters:
- index.qmd
- contents/core/introduction/introduction.qmd
# - contents/core/ml_systems/ml_systems.qmd # Commented out
# - contents/core/nn_computation/nn_computation.qmd # Commented out
```
### Restoring Full Configuration
Simply uncomment the chapters and bibliography entries you want to restore.
## 🔧 Troubleshooting
### Common Issues
1. **Build fails with missing files**
```bash
make clean # Clean artifacts
make check # Verify structure
```
2. **Git hook blocks commit**
```bash
make clean # Remove artifacts
git status # Check what's staged
```
3. **Slow builds**
```bash
make clean-deep # Full cleanup
# Use minimal configuration
```
4. **Permission denied on scripts**
```bash
make setup-hooks # Fix permissions
```
### Getting Help
```bash
./binder help # Show all commands
./binder --help # Detailed help
```
## 🎯 Best Practices
### Before Starting Work
```bash
git pull # Get latest changes
./binder clean # Clean workspace
./binder doctor # Verify health
```
### Daily Development Workflow
```bash
# 1. Clean and build
./binder clean
./binder build
# 2. Start development server
./binder preview
# 3. Make changes to .qmd files
# 4. Preview updates automatically
# 5. When ready to commit
git add . # Pre-commit hook runs automatically
git commit -m "Your message"
```
### Before Major Changes
```bash
./binder clean # Full cleanup
./binder build # Clean build
./binder doctor # Run all checks
```
### Release Preparation
```bash
./binder doctor # Comprehensive validation
./binder build # Build HTML
./binder build pdf # Build PDF
./binder build epub # Build EPUB
```
## ⚙️ Configuration Files
- **`quarto/config/_quarto-html.yml`**: HTML website configuration
- **`quarto/config/_quarto-pdf.yml`**: PDF book configuration
- **`binder`**: Book Binder CLI (build and development tool)
- **`.git/hooks/pre-commit`**: Automated cleanup hook
- **`.gitignore`**: Ignored file patterns
## 🗂️ Scripts Organization
The `tools/scripts/` directory is organized into logical categories:
```
tools/scripts/
├── build/ # Build and development scripts (clean.sh, etc.)
├── content/ # Content management tools
├── maintenance/ # System maintenance scripts
├── testing/ # Test and validation scripts
├── utilities/ # General utility scripts
├── docs/ # Script documentation
├── genai/ # AI and generation tools
├── cross_refs/ # Cross-reference management
├── quarto_publish/ # Publishing workflows
└── ai_menu/ # AI menu tools
```
Each directory has its own README.md with specific usage instructions.
## 🤝 Contributing
1. **Fork and clone** the repository
2. **Run setup**: `make setup-hooks && make install`
3. **Make changes** with the development workflow above
4. **Test thoroughly**: `make test && make build-all`
5. **Submit pull request** with clean commits
The automated cleanup system ensures that your commits will be clean and won't include build artifacts, making code reviews easier and keeping the repository tidy.
## 📞 Support
If you encounter issues with the development workflow:
1. Check this guide first
2. Run `make check` for diagnostics
3. Review the cleanup script output with `make clean-dry`
4. Ask for help in project discussions