mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-05 18:38:18 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
166 lines
4.5 KiB
Markdown
166 lines
4.5 KiB
Markdown
# Module 20: Capstone - Complete ML System Integration
|
|
|
|
## Overview
|
|
Combine everything you've learned to build a complete, optimized ML system from scratch. This is your masterpiece - demonstrating mastery of both ML algorithms and systems engineering.
|
|
|
|
## Project Options
|
|
|
|
### Option 1: Optimized CIFAR-10 Trainer
|
|
**Goal**: 75% accuracy with minimal resources
|
|
- Start with your Module 10 trainer
|
|
- Apply all optimizations (acceleration, quantization, pruning)
|
|
- Achieve same accuracy with 10x less compute/memory
|
|
- Deploy on resource-constrained device
|
|
|
|
### Option 2: Efficient GPT Inference Engine
|
|
**Goal**: Real-time text generation on CPU
|
|
- Implement KV caching for transformers
|
|
- Quantize model to INT8
|
|
- Optimize attention computation
|
|
- Generate 100 tokens/second on laptop CPU
|
|
|
|
### Option 3: Custom Challenge
|
|
**Goal**: Define your own optimization challenge
|
|
- Pick a problem you care about
|
|
- Set performance targets
|
|
- Apply systematic optimization
|
|
- Document the journey
|
|
|
|
## What You'll Demonstrate
|
|
|
|
### 1. Full Stack Understanding
|
|
- Build complete training pipeline
|
|
- Implement model architecture
|
|
- Add optimization layers
|
|
- Deploy to production
|
|
|
|
### 2. Systems Engineering
|
|
- Profile and identify bottlenecks
|
|
- Apply appropriate optimizations
|
|
- Measure and validate improvements
|
|
- Handle resource constraints
|
|
|
|
### 3. Scientific Approach
|
|
- Baseline measurements
|
|
- Systematic optimization
|
|
- Ablation studies
|
|
- Reproducible results
|
|
|
|
## Capstone Structure
|
|
|
|
### Week 1: Planning & Baseline
|
|
```python
|
|
# 1. Choose project and define success metrics
|
|
metrics = {
|
|
'accuracy_target': 75.0,
|
|
'inference_time': '<10ms',
|
|
'memory_usage': '<100MB',
|
|
'model_size': '<10MB'
|
|
}
|
|
|
|
# 2. Build baseline system
|
|
baseline = build_baseline_model()
|
|
baseline_metrics = evaluate(baseline)
|
|
|
|
# 3. Profile and identify opportunities
|
|
bottlenecks = profile_system(baseline)
|
|
```
|
|
|
|
### Week 2: Optimization Sprint
|
|
```python
|
|
# 4. Apply optimizations systematically
|
|
optimized = baseline
|
|
optimized = apply_acceleration(optimized)
|
|
optimized = apply_quantization(optimized)
|
|
optimized = apply_pruning(optimized)
|
|
optimized = apply_caching(optimized)
|
|
|
|
# 5. Measure improvements
|
|
for optimization in optimizations:
|
|
metrics = evaluate(optimized)
|
|
speedup = baseline_time / optimized_time
|
|
print(f"{optimization}: {speedup}x faster")
|
|
```
|
|
|
|
### Week 3: Polish & Deploy
|
|
```python
|
|
# 6. Final optimization pass
|
|
final_model = fine_tune_optimizations(optimized)
|
|
|
|
# 7. Create deployment package
|
|
deployment = package_for_production(final_model)
|
|
|
|
# 8. Document results
|
|
write_technical_report(baseline, final_model, metrics)
|
|
```
|
|
|
|
## Deliverables
|
|
|
|
### 1. Working System
|
|
- Complete codebase on GitHub
|
|
- README with setup instructions
|
|
- Demonstration video/notebook
|
|
|
|
### 2. Technical Report
|
|
- Problem statement and approach
|
|
- Baseline vs optimized metrics
|
|
- Optimization journey and decisions
|
|
- Lessons learned
|
|
|
|
### 3. Performance Analysis
|
|
- Comprehensive benchmarks
|
|
- Ablation study results
|
|
- Resource utilization graphs
|
|
- Comparison with PyTorch/TensorFlow
|
|
|
|
## Evaluation Criteria
|
|
|
|
### Technical Excellence (40%)
|
|
- Correctness of implementation
|
|
- Quality of optimizations
|
|
- Code organization and style
|
|
|
|
### Performance Achievement (30%)
|
|
- Meeting stated goals
|
|
- Improvement over baseline
|
|
- Resource efficiency
|
|
|
|
### Systems Understanding (30%)
|
|
- Appropriate optimization choices
|
|
- Understanding of tradeoffs
|
|
- Scientific methodology
|
|
|
|
## Example Projects from Past Students
|
|
|
|
### "TinyYOLO" - Real-time Object Detection
|
|
- 30 FPS on Raspberry Pi
|
|
- 90% size reduction through pruning
|
|
- Custom INT8 kernels for ARM
|
|
|
|
### "NanoGPT" - Edge Language Model
|
|
- 100MB model generates Shakespeare
|
|
- KV caching + quantization
|
|
- Runs on 2015 laptop
|
|
|
|
### "SwiftCNN" - Instant Image Classification
|
|
- <1ms inference on iPhone
|
|
- Structured pruning + iOS Metal
|
|
- 95% of ResNet accuracy at 10% size
|
|
|
|
## Resources
|
|
- All previous module code
|
|
- TinyTorch optimization library
|
|
- Benchmarking tools
|
|
- Community Discord for help
|
|
|
|
## Success Criteria
|
|
- ✅ Complete working system with all optimizations
|
|
- ✅ 10x+ improvement in speed OR memory
|
|
- ✅ Professional documentation and analysis
|
|
- ✅ Understanding of when/why to apply each optimization
|
|
- ✅ Ready for ML systems engineering roles!
|
|
|
|
## Final Note
|
|
This is your chance to show everything you've learned. Build something you're proud of - something that demonstrates not just that you can implement ML algorithms, but that you understand how to build production ML systems.
|
|
|
|
**Remember**: The goal isn't perfection, it's demonstrating systematic thinking about performance, memory, and deployment constraints - the real challenges of ML engineering. |