mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 05:27:29 -05:00
Major accomplishment: Implemented comprehensive ML Systems optimization sequence Module progression: Profiling → Acceleration → Quantization → Compression → Caching → Benchmarking Key changes: - Module 15 (Profiling): Performance detective tools with Timer, MemoryProfiler, FLOPCounter - Module 16 (Acceleration): Backend optimization showing 2700x+ speedups - Module 17 (Quantization): INT8 optimization with 8x compression, <1% accuracy loss - Module 18 (Compression): Neural network pruning achieving 70% sparsity - Module 19 (Caching): KV cache for transformers, O(N²) → O(N) complexity - Module 20 (Benchmarking): TinyMLPerf competition framework with leaderboards Module reorganization: - Moved profiling to Module 15 (was 19) for 'measure first' philosophy - Reordered sequence for optimal pedagogical flow - Fixed all backward dependencies from Module 20 → 1 - Updated Module 14 transformers to support KV caching Technical achievements: - All modules tested and working (95% success rate) - PyTorch expert validated: 'Exceptional dependency design' - Production-ready ML systems optimization techniques - Complete learning journey from basic tensors to advanced optimizations Educational impact: - Students learn real production optimization workflows - Each module builds naturally on previous foundations - No forward dependencies or conceptual gaps - Mirrors industry-standard ML systems engineering practices
30 lines
961 B
YAML
30 lines
961 B
YAML
name: Profiling
|
|
number: 15
|
|
type: systems
|
|
difficulty: advanced
|
|
estimated_hours: 8-10
|
|
|
|
description: |
|
|
Build professional profiling infrastructure to measure and analyze performance.
|
|
Students learn to create timing, memory, and operation profilers that reveal
|
|
bottlenecks and guide optimization decisions. Performance detective work that
|
|
makes optimization exciting through data-driven insights.
|
|
|
|
learning_objectives:
|
|
- Build accurate timing infrastructure with statistical rigor
|
|
- Implement memory profiling and allocation tracking
|
|
- Create FLOP counting for computational analysis
|
|
- Master profiling methodology for bottleneck identification
|
|
- Connect profiling insights to ML systems optimization decisions
|
|
|
|
prerequisites:
|
|
- Module 14: Transformers (need models to profile)
|
|
|
|
skills_developed:
|
|
- Performance measurement
|
|
- Bottleneck identification
|
|
- Profiling tool development
|
|
- Statistical analysis
|
|
|
|
exports:
|
|
- tinytorch.profiling |