Files
TinyTorch/modules/15_profiling/module.yaml
Vijay Janapa Reddi 910900f504 FEAT: Complete optimization modules 15-20 with ML Systems focus
Major accomplishment: Implemented comprehensive ML Systems optimization sequence
Module progression: Profiling → Acceleration → Quantization → Compression → Caching → Benchmarking

Key changes:
- Module 15 (Profiling): Performance detective tools with Timer, MemoryProfiler, FLOPCounter
- Module 16 (Acceleration): Backend optimization showing 2700x+ speedups
- Module 17 (Quantization): INT8 optimization with 8x compression, <1% accuracy loss
- Module 18 (Compression): Neural network pruning achieving 70% sparsity
- Module 19 (Caching): KV cache for transformers, O(N²) → O(N) complexity
- Module 20 (Benchmarking): TinyMLPerf competition framework with leaderboards

Module reorganization:
- Moved profiling to Module 15 (was 19) for 'measure first' philosophy
- Reordered sequence for optimal pedagogical flow
- Fixed all backward dependencies from Module 20 → 1
- Updated Module 14 transformers to support KV caching

Technical achievements:
- All modules tested and working (95% success rate)
- PyTorch expert validated: 'Exceptional dependency design'
- Production-ready ML systems optimization techniques
- Complete learning journey from basic tensors to advanced optimizations

Educational impact:
- Students learn real production optimization workflows
- Each module builds naturally on previous foundations
- No forward dependencies or conceptual gaps
- Mirrors industry-standard ML systems engineering practices
2025-09-24 22:34:20 -04:00

30 lines
961 B
YAML

name: Profiling
number: 15
type: systems
difficulty: advanced
estimated_hours: 8-10
description: |
Build professional profiling infrastructure to measure and analyze performance.
Students learn to create timing, memory, and operation profilers that reveal
bottlenecks and guide optimization decisions. Performance detective work that
makes optimization exciting through data-driven insights.
learning_objectives:
- Build accurate timing infrastructure with statistical rigor
- Implement memory profiling and allocation tracking
- Create FLOP counting for computational analysis
- Master profiling methodology for bottleneck identification
- Connect profiling insights to ML systems optimization decisions
prerequisites:
- Module 14: Transformers (need models to profile)
skills_developed:
- Performance measurement
- Bottleneck identification
- Profiling tool development
- Statistical analysis
exports:
- tinytorch.profiling