Files
TinyTorch/COMPLETE_MODULE_ROADMAP.md
Vijay Janapa Reddi 2f23f757e7 MAJOR: Implement beautiful module progression through strategic reordering
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

-  All CLI commands still function
-  Checkpoint system mappings updated
-  Documentation consistency maintained
-  Test directory structure aligned
-  Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
2025-09-24 15:56:47 -04:00

6.0 KiB

TinyTorch Complete Module Roadmap

20-Module ML Systems Course with Competition System

PHASE 1: FOUNDATION (Modules 1-6)

Build the core mathematical infrastructure for neural networks.

  • Module 01: setup - Development environment configuration
  • Module 02: tensor - Core data structures with autodiff support (backward design: built-in grad support)
  • Module 03: activations - ReLU, Sigmoid, nonlinearity functions
  • Module 04: layers - Dense layers, network building blocks
  • Module 05: losses - MSE, CrossEntropy, BCE loss functions
  • Module 06: autograd - Automatic differentiation engine

Capability Unlocked: Networks can learn through backpropagation Historical Example: XOR Problem (1969) - Solve what stumped AI for a decade


PHASE 2: TRAINING SYSTEMS (Modules 7-10)

Build complete training pipelines for real datasets.

  • Module 07: dataloader - Data pipelines, batching, real datasets (moved from 09)
  • Module 08: optimizers - SGD, Adam optimization algorithms
  • Module 09: spatial - Conv2D, pooling for image processing (moved from 07)
  • Module 10: training - Complete training loops with validation

Capability Unlocked: Train deep networks on real datasets Historical Examples:

  • After Module 9: LeNet (1998) - First CNN for digit recognition
  • After Module 10: AlexNet (2012) - Deep learning revolution

PHASE 3: LANGUAGE MODELS (Modules 11-14)

Build modern transformer architectures for NLP.

  • Module 11: tokenization - Text preprocessing and tokenization
  • Module 12: embeddings - Word vectors, positional encoding
  • Module 13: attention - Self-attention mechanisms
  • Module 14: transformers - Complete transformer architecture

Capability Unlocked: Build GPT-style language models Historical Example: GPT (2018) - Foundation of modern AI


PHASE 4: SYSTEM OPTIMIZATION (Modules 15-19)

Transform educational code into production-ready systems through progressive optimization.

  • Module 15: acceleration - Core performance optimization

    • Journey from educational loops to optimized operations
    • Cache-friendly blocking for matrix multiplication
    • NumPy vectorization (10-100x speedups)
    • Transparent backend dispatch (existing code runs faster automatically!)
  • Module 16: caching - Memory optimization patterns

    • KV caching for transformer inference
    • Incremental computation techniques
    • Autoregressive generation optimization
    • Memory vs computation tradeoffs
  • Module 17: precision - Numerical optimization

    • Post-training INT8 quantization
    • Calibration and scaling techniques
    • Accuracy vs performance tradeoffs
    • Memory footprint reduction
  • Module 18: compression - Model size optimization

    • Magnitude-based pruning
    • Structured vs unstructured sparsity
    • Knowledge distillation basics
    • Deployment optimization
  • Module 19: benchmarking - Performance analysis

    • Profiling and bottleneck identification
    • Memory usage analysis
    • Comparative benchmarking
    • Scientific performance measurement

PHASE 5: CAPSTONE PROJECT (Module 20)

  • Module 20: capstone - Complete ML system
    • Combine all optimization techniques
    • Build optimized end-to-end systems
    • Example projects:
      • Optimized CIFAR-10 trainer (75% accuracy, minimal resources)
      • Efficient GPT inference engine (memory-constrained)
      • Custom optimization challenge
    • Deploy production-ready ML systems

Key Design Principles

1. Backward Design Philosophy

Each module is designed with future needs in mind:

  • Tensors (Module 2): Built with gradient support from day 1
  • Layers (Module 4): Parameter management ready for optimizers
  • Training (Module 10): Memory tracking for optimization modules
  • Transformers (Module 14): KV structure ready for caching

2. Backend Dispatch Architecture

# Students run SAME code throughout
model.train()  # Uses appropriate backend automatically

# Module 1-14: Naive backend (for learning)
# Module 15+: Optimized backend (for performance)
# Zero code changes needed!

3. Progressive Optimization Journey

  • Understanding through implementation (Modules 1-14): Build with loops for clarity
  • Systematic optimization (Modules 15-19): Transform loops into production code
  • Transparent acceleration: Optimizations work automatically on existing code
  • Real-world techniques: Learn optimizations used in PyTorch/TensorFlow

4. Historical Context

Examples map to ML breakthroughs:

  • 1957: Perceptron (Module 4)
  • 1969: XOR Solution (Module 6)
  • 1998: LeNet (Module 9)
  • 2012: AlexNet (Module 10)
  • 2018: GPT (Module 14)

Learning Progression

Weeks 1-6: Foundation

Students build mathematical infrastructure and understand how neural networks work.

Weeks 7-10: Training Systems

Students build complete training pipelines and understand how to scale to real datasets.

Weeks 11-14: Modern AI

Students build transformer architectures that power ChatGPT and modern AI.

Weeks 15-19: System Optimization

Students transform educational code into production-ready systems through progressive optimization techniques.

Week 20: Capstone Project

Students combine all techniques to build complete, optimized ML systems from scratch.


Success Metrics

By completion, students will have:

  • Built every component of modern ML systems from scratch
  • Recreated the major breakthroughs in AI history
  • Transformed educational loops into production-ready code (10-100x speedups)
  • Understood why PyTorch, TensorFlow are designed the way they are
  • Mastered real-world optimization techniques (caching, quantization, pruning)
  • Built complete ML systems that transparently optimize themselves

Ultimate Goal: Students who can read PyTorch source code and think "I understand why they did it this way - I built this myself in TinyTorch!"