mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-04 15:47:17 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
Module 20: Capstone - Complete ML System Integration
Overview
Combine everything you've learned to build a complete, optimized ML system from scratch. This is your masterpiece - demonstrating mastery of both ML algorithms and systems engineering.
Project Options
Option 1: Optimized CIFAR-10 Trainer
Goal: 75% accuracy with minimal resources
- Start with your Module 10 trainer
- Apply all optimizations (acceleration, quantization, pruning)
- Achieve same accuracy with 10x less compute/memory
- Deploy on resource-constrained device
Option 2: Efficient GPT Inference Engine
Goal: Real-time text generation on CPU
- Implement KV caching for transformers
- Quantize model to INT8
- Optimize attention computation
- Generate 100 tokens/second on laptop CPU
Option 3: Custom Challenge
Goal: Define your own optimization challenge
- Pick a problem you care about
- Set performance targets
- Apply systematic optimization
- Document the journey
What You'll Demonstrate
1. Full Stack Understanding
- Build complete training pipeline
- Implement model architecture
- Add optimization layers
- Deploy to production
2. Systems Engineering
- Profile and identify bottlenecks
- Apply appropriate optimizations
- Measure and validate improvements
- Handle resource constraints
3. Scientific Approach
- Baseline measurements
- Systematic optimization
- Ablation studies
- Reproducible results
Capstone Structure
Week 1: Planning & Baseline
# 1. Choose project and define success metrics
metrics = {
'accuracy_target': 75.0,
'inference_time': '<10ms',
'memory_usage': '<100MB',
'model_size': '<10MB'
}
# 2. Build baseline system
baseline = build_baseline_model()
baseline_metrics = evaluate(baseline)
# 3. Profile and identify opportunities
bottlenecks = profile_system(baseline)
Week 2: Optimization Sprint
# 4. Apply optimizations systematically
optimized = baseline
optimized = apply_acceleration(optimized)
optimized = apply_quantization(optimized)
optimized = apply_pruning(optimized)
optimized = apply_caching(optimized)
# 5. Measure improvements
for optimization in optimizations:
metrics = evaluate(optimized)
speedup = baseline_time / optimized_time
print(f"{optimization}: {speedup}x faster")
Week 3: Polish & Deploy
# 6. Final optimization pass
final_model = fine_tune_optimizations(optimized)
# 7. Create deployment package
deployment = package_for_production(final_model)
# 8. Document results
write_technical_report(baseline, final_model, metrics)
Deliverables
1. Working System
- Complete codebase on GitHub
- README with setup instructions
- Demonstration video/notebook
2. Technical Report
- Problem statement and approach
- Baseline vs optimized metrics
- Optimization journey and decisions
- Lessons learned
3. Performance Analysis
- Comprehensive benchmarks
- Ablation study results
- Resource utilization graphs
- Comparison with PyTorch/TensorFlow
Evaluation Criteria
Technical Excellence (40%)
- Correctness of implementation
- Quality of optimizations
- Code organization and style
Performance Achievement (30%)
- Meeting stated goals
- Improvement over baseline
- Resource efficiency
Systems Understanding (30%)
- Appropriate optimization choices
- Understanding of tradeoffs
- Scientific methodology
Example Projects from Past Students
"TinyYOLO" - Real-time Object Detection
- 30 FPS on Raspberry Pi
- 90% size reduction through pruning
- Custom INT8 kernels for ARM
"NanoGPT" - Edge Language Model
- 100MB model generates Shakespeare
- KV caching + quantization
- Runs on 2015 laptop
"SwiftCNN" - Instant Image Classification
- <1ms inference on iPhone
- Structured pruning + iOS Metal
- 95% of ResNet accuracy at 10% size
Resources
- All previous module code
- TinyTorch optimization library
- Benchmarking tools
- Community Discord for help
Success Criteria
- ✅ Complete working system with all optimizations
- ✅ 10x+ improvement in speed OR memory
- ✅ Professional documentation and analysis
- ✅ Understanding of when/why to apply each optimization
- ✅ Ready for ML systems engineering roles!
Final Note
This is your chance to show everything you've learned. Build something you're proud of - something that demonstrates not just that you can implement ML algorithms, but that you understand how to build production ML systems.
Remember: The goal isn't perfection, it's demonstrating systematic thinking about performance, memory, and deployment constraints - the real challenges of ML engineering.