Files
TinyTorch/modules/20_capstone
Vijay Janapa Reddi 2f23f757e7 MAJOR: Implement beautiful module progression through strategic reordering
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

-  All CLI commands still function
-  Checkpoint system mappings updated
-  Documentation consistency maintained
-  Test directory structure aligned
-  Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
2025-09-24 15:56:47 -04:00
..

Module 20: Capstone - Complete ML System Integration

Overview

Combine everything you've learned to build a complete, optimized ML system from scratch. This is your masterpiece - demonstrating mastery of both ML algorithms and systems engineering.

Project Options

Option 1: Optimized CIFAR-10 Trainer

Goal: 75% accuracy with minimal resources

  • Start with your Module 10 trainer
  • Apply all optimizations (acceleration, quantization, pruning)
  • Achieve same accuracy with 10x less compute/memory
  • Deploy on resource-constrained device

Option 2: Efficient GPT Inference Engine

Goal: Real-time text generation on CPU

  • Implement KV caching for transformers
  • Quantize model to INT8
  • Optimize attention computation
  • Generate 100 tokens/second on laptop CPU

Option 3: Custom Challenge

Goal: Define your own optimization challenge

  • Pick a problem you care about
  • Set performance targets
  • Apply systematic optimization
  • Document the journey

What You'll Demonstrate

1. Full Stack Understanding

  • Build complete training pipeline
  • Implement model architecture
  • Add optimization layers
  • Deploy to production

2. Systems Engineering

  • Profile and identify bottlenecks
  • Apply appropriate optimizations
  • Measure and validate improvements
  • Handle resource constraints

3. Scientific Approach

  • Baseline measurements
  • Systematic optimization
  • Ablation studies
  • Reproducible results

Capstone Structure

Week 1: Planning & Baseline

# 1. Choose project and define success metrics
metrics = {
    'accuracy_target': 75.0,
    'inference_time': '<10ms',
    'memory_usage': '<100MB',
    'model_size': '<10MB'
}

# 2. Build baseline system
baseline = build_baseline_model()
baseline_metrics = evaluate(baseline)

# 3. Profile and identify opportunities
bottlenecks = profile_system(baseline)

Week 2: Optimization Sprint

# 4. Apply optimizations systematically
optimized = baseline
optimized = apply_acceleration(optimized)
optimized = apply_quantization(optimized)  
optimized = apply_pruning(optimized)
optimized = apply_caching(optimized)

# 5. Measure improvements
for optimization in optimizations:
    metrics = evaluate(optimized)
    speedup = baseline_time / optimized_time
    print(f"{optimization}: {speedup}x faster")

Week 3: Polish & Deploy

# 6. Final optimization pass
final_model = fine_tune_optimizations(optimized)

# 7. Create deployment package
deployment = package_for_production(final_model)

# 8. Document results
write_technical_report(baseline, final_model, metrics)

Deliverables

1. Working System

  • Complete codebase on GitHub
  • README with setup instructions
  • Demonstration video/notebook

2. Technical Report

  • Problem statement and approach
  • Baseline vs optimized metrics
  • Optimization journey and decisions
  • Lessons learned

3. Performance Analysis

  • Comprehensive benchmarks
  • Ablation study results
  • Resource utilization graphs
  • Comparison with PyTorch/TensorFlow

Evaluation Criteria

Technical Excellence (40%)

  • Correctness of implementation
  • Quality of optimizations
  • Code organization and style

Performance Achievement (30%)

  • Meeting stated goals
  • Improvement over baseline
  • Resource efficiency

Systems Understanding (30%)

  • Appropriate optimization choices
  • Understanding of tradeoffs
  • Scientific methodology

Example Projects from Past Students

"TinyYOLO" - Real-time Object Detection

  • 30 FPS on Raspberry Pi
  • 90% size reduction through pruning
  • Custom INT8 kernels for ARM

"NanoGPT" - Edge Language Model

  • 100MB model generates Shakespeare
  • KV caching + quantization
  • Runs on 2015 laptop

"SwiftCNN" - Instant Image Classification

  • <1ms inference on iPhone
  • Structured pruning + iOS Metal
  • 95% of ResNet accuracy at 10% size

Resources

  • All previous module code
  • TinyTorch optimization library
  • Benchmarking tools
  • Community Discord for help

Success Criteria

  • Complete working system with all optimizations
  • 10x+ improvement in speed OR memory
  • Professional documentation and analysis
  • Understanding of when/why to apply each optimization
  • Ready for ML systems engineering roles!

Final Note

This is your chance to show everything you've learned. Build something you're proud of - something that demonstrates not just that you can implement ML algorithms, but that you understand how to build production ML systems.

Remember: The goal isn't perfection, it's demonstrating systematic thinking about performance, memory, and deployment constraints - the real challenges of ML engineering.