mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 17:22:34 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
Module 19: Benchmarking - Performance Measurement & Analysis
Overview
Learn to scientifically measure, analyze, and optimize ML system performance. Build profiling tools that identify bottlenecks and guide optimization decisions.
What You'll Build
- Performance Profiler: Measure time, memory, and compute
- Bottleneck Analyzer: Identify optimization opportunities
- Comparison Framework: A/B test different approaches
- Visualization Tools: Performance dashboards
Learning Objectives
- Scientific Measurement: Reproducible performance testing
- Profiling Techniques: Time, memory, and operation profiling
- Bottleneck Analysis: Find and fix performance issues
- Optimization Validation: Prove improvements work
Prerequisites
- Modules 15-18: All optimization techniques
- Module 10: Training (baseline for comparison)
Key Concepts
Comprehensive Profiling
@profile
def model_forward(model, input):
with Timer() as t:
with MemoryTracker() as m:
output = model(input)
print(f"Time: {t.elapsed}ms")
print(f"Memory: {m.peak_usage}MB")
print(f"FLOPs: {count_flops(model, input)}")
Bottleneck Identification
profiler = Profiler()
with profiler:
model.train(data_loader)
# Find top time consumers
profiler.print_top_operations(n=10)
# 45% - Matrix multiplication
# 23% - Attention computation
# 15% - Data loading
# ...
A/B Testing
# Compare optimization techniques
baseline = measure_performance(original_model)
optimized = measure_performance(quantized_model)
improvement = {
'speedup': optimized.time / baseline.time,
'memory_reduction': baseline.memory / optimized.memory,
'accuracy_delta': optimized.accuracy - baseline.accuracy
}
Tools You'll Master
- Time Profiling: Where cycles are spent
- Memory Profiling: Peak usage and allocation patterns
- Operation Counting: FLOPs and memory bandwidth
- Statistical Analysis: Confidence intervals and significance
Real-World Skills
- Production Profiling: Tools used at Meta, Google
- Performance Debugging: Find unexpected slowdowns
- Optimization Planning: Data-driven decisions
- Regression Testing: Ensure optimizations persist
Module Structure
- Measurement Fundamentals: Accurate timing and memory tracking
- Building Profilers: Hook-based profiling system
- Analysis Tools: Statistical analysis of results
- Visualization: Performance dashboards
- Case Studies: Profile and optimize real models
Practical Examples
# Profile your optimizations
models = {
'baseline': original_model,
'quantized': quantized_model,
'pruned': pruned_model,
'cached': cached_transformer
}
results = benchmark_suite(models, test_data)
plot_performance_comparison(results)
# Output:
# Model Time Memory Accuracy
# baseline 100ms 400MB 75.0%
# quantized 25ms 100MB 74.5%
# pruned 30ms 40MB 73.8%
# cached 20ms 450MB 75.0%
Advanced Topics
- Roofline Analysis: Hardware utilization
- Memory Bandwidth: Identifying memory-bound operations
- Cache Analysis: L1/L2/L3 cache behavior
- Distributed Profiling: Multi-GPU systems
Success Criteria
- ✅ Build complete profiling system from scratch
- ✅ Identify and fix 3+ performance bottlenecks
- ✅ Create reproducible benchmark suite
- ✅ Generate professional performance reports