This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
6.0 KiB
TinyTorch Complete Module Roadmap
20-Module ML Systems Course with Competition System
PHASE 1: FOUNDATION (Modules 1-6)
Build the core mathematical infrastructure for neural networks.
- Module 01:
setup- Development environment configuration - Module 02:
tensor- Core data structures with autodiff support (backward design: built-in grad support) - Module 03:
activations- ReLU, Sigmoid, nonlinearity functions - Module 04:
layers- Dense layers, network building blocks - Module 05:
losses- MSE, CrossEntropy, BCE loss functions - Module 06:
autograd- Automatic differentiation engine
Capability Unlocked: Networks can learn through backpropagation Historical Example: XOR Problem (1969) - Solve what stumped AI for a decade
PHASE 2: TRAINING SYSTEMS (Modules 7-10)
Build complete training pipelines for real datasets.
- Module 07:
dataloader- Data pipelines, batching, real datasets (moved from 09) - Module 08:
optimizers- SGD, Adam optimization algorithms - Module 09:
spatial- Conv2D, pooling for image processing (moved from 07) - Module 10:
training- Complete training loops with validation
Capability Unlocked: Train deep networks on real datasets Historical Examples:
- After Module 9: LeNet (1998) - First CNN for digit recognition
- After Module 10: AlexNet (2012) - Deep learning revolution
PHASE 3: LANGUAGE MODELS (Modules 11-14)
Build modern transformer architectures for NLP.
- Module 11:
tokenization- Text preprocessing and tokenization - Module 12:
embeddings- Word vectors, positional encoding - Module 13:
attention- Self-attention mechanisms - Module 14:
transformers- Complete transformer architecture
Capability Unlocked: Build GPT-style language models Historical Example: GPT (2018) - Foundation of modern AI
PHASE 4: SYSTEM OPTIMIZATION (Modules 15-19)
Transform educational code into production-ready systems through progressive optimization.
-
Module 15:
acceleration- Core performance optimization- Journey from educational loops to optimized operations
- Cache-friendly blocking for matrix multiplication
- NumPy vectorization (10-100x speedups)
- Transparent backend dispatch (existing code runs faster automatically!)
-
Module 16:
caching- Memory optimization patterns- KV caching for transformer inference
- Incremental computation techniques
- Autoregressive generation optimization
- Memory vs computation tradeoffs
-
Module 17:
precision- Numerical optimization- Post-training INT8 quantization
- Calibration and scaling techniques
- Accuracy vs performance tradeoffs
- Memory footprint reduction
-
Module 18:
compression- Model size optimization- Magnitude-based pruning
- Structured vs unstructured sparsity
- Knowledge distillation basics
- Deployment optimization
-
Module 19:
benchmarking- Performance analysis- Profiling and bottleneck identification
- Memory usage analysis
- Comparative benchmarking
- Scientific performance measurement
PHASE 5: CAPSTONE PROJECT (Module 20)
- Module 20:
capstone- Complete ML system- Combine all optimization techniques
- Build optimized end-to-end systems
- Example projects:
- Optimized CIFAR-10 trainer (75% accuracy, minimal resources)
- Efficient GPT inference engine (memory-constrained)
- Custom optimization challenge
- Deploy production-ready ML systems
Key Design Principles
1. Backward Design Philosophy
Each module is designed with future needs in mind:
- Tensors (Module 2): Built with gradient support from day 1
- Layers (Module 4): Parameter management ready for optimizers
- Training (Module 10): Memory tracking for optimization modules
- Transformers (Module 14): KV structure ready for caching
2. Backend Dispatch Architecture
# Students run SAME code throughout
model.train() # Uses appropriate backend automatically
# Module 1-14: Naive backend (for learning)
# Module 15+: Optimized backend (for performance)
# Zero code changes needed!
3. Progressive Optimization Journey
- Understanding through implementation (Modules 1-14): Build with loops for clarity
- Systematic optimization (Modules 15-19): Transform loops into production code
- Transparent acceleration: Optimizations work automatically on existing code
- Real-world techniques: Learn optimizations used in PyTorch/TensorFlow
4. Historical Context
Examples map to ML breakthroughs:
- 1957: Perceptron (Module 4)
- 1969: XOR Solution (Module 6)
- 1998: LeNet (Module 9)
- 2012: AlexNet (Module 10)
- 2018: GPT (Module 14)
Learning Progression
Weeks 1-6: Foundation
Students build mathematical infrastructure and understand how neural networks work.
Weeks 7-10: Training Systems
Students build complete training pipelines and understand how to scale to real datasets.
Weeks 11-14: Modern AI
Students build transformer architectures that power ChatGPT and modern AI.
Weeks 15-19: System Optimization
Students transform educational code into production-ready systems through progressive optimization techniques.
Week 20: Capstone Project
Students combine all techniques to build complete, optimized ML systems from scratch.
Success Metrics
By completion, students will have:
- ✅ Built every component of modern ML systems from scratch
- ✅ Recreated the major breakthroughs in AI history
- ✅ Transformed educational loops into production-ready code (10-100x speedups)
- ✅ Understood why PyTorch, TensorFlow are designed the way they are
- ✅ Mastered real-world optimization techniques (caching, quantization, pruning)
- ✅ Built complete ML systems that transparently optimize themselves
Ultimate Goal: Students who can read PyTorch source code and think "I understand why they did it this way - I built this myself in TinyTorch!"