mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-06 17:37:41 -05:00

Files

Vijay Janapa Reddi 25497661fc Update module numbering from 00-13 to 01-14 and refresh tagline

- Updated all module references to start from 01 instead of 00
- Changed tagline to 'Build your own ML framework. Start small. Go deep.'
- Added educational foundation section linking to ML Systems book
- Updated README, documentation, CLI examples, and prerequisites
- Regenerated book content with consistent numbering throughout
- Maintains 14 modules total but with natural numbering (01-14)

2025-07-15 21:11:07 -04:00

benchmarking_dev.ipynb

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

benchmarking_dev.py

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

module.yaml

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

README.md

Update module numbering from 00-13 to 01-14 and refresh tagline

2025-07-15 21:11:07 -04:00

test_report.md

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

README.md

📊 Module 12: Benchmarking - Systematic ML Performance Evaluation

📊 Module Info

Difficulty: ⭐⭐⭐⭐ Advanced
Time Estimate: 6-8 hours
Prerequisites: All previous modules (01-12), especially Kernels
Next Steps: MLOps module (13)

Learn to systematically evaluate ML systems using industry-standard benchmarking methodology

🎯 Learning Objectives

After completing this module, you will:

Design systematic benchmarking experiments for ML systems
Apply MLPerf-inspired patterns to evaluate model performance
Implement statistical validation for benchmark results
Create professional performance reports and comparisons
Apply systematic evaluation to real ML projects

🔗 Connection to Previous Modules

What You Already Know

Kernels (Module 11): How to optimize individual operations
Training (Module 09): End-to-end model training workflows
Compression (Module 10): Model optimization techniques
Networks (Module 04): Model architectures and complexity

The Evaluation Gap

Students understand how to build ML systems but not how to evaluate them systematically:

✅ Implementation: Can build tensors, layers, networks, optimizers
❌ Evaluation: Don't know how to measure performance reliably
✅ Optimization: Can implement kernels and compression
❌ Validation: Can't prove optimizations actually work

🧠 Build → Use → Analyze

This module follows the "Build → Use → Analyze" pedagogical framework:

1. Build: Benchmarking Framework

Understand the four-component MLPerf architecture
Learn different benchmark scenarios (latency, throughput, server)
Implement statistical validation for meaningful results

2. Use: Systematic Evaluation

Apply benchmarking to your TinyTorch models
Compare different approaches systematically
Validate optimization claims with proper methodology

3. Analyze: Professional Reporting

Generate industry-standard performance reports
Present results with statistical confidence
Prepare for capstone project presentations

🎓 Why This Matters

Industry Reality

Real ML engineers spend significant time on:

A/B testing: Comparing model variants in production
Performance optimization: Proving optimizations actually work
Research validation: Demonstrating improvements over baselines
System design: Choosing between architectural alternatives

Professional Applications

This module prepares you for:

ML project evaluation: Systematic comparison against baselines
Performance presentations: Professional reporting of results
Statistical validation: Proving your improvements are significant
Research methodology: Reproducible evaluation practices

🚀 Key Concepts

MLPerf-Inspired Architecture

System Under Test (SUT): Your ML model/system
Dataset: Standardized evaluation data
Model: The specific architecture being tested
Load Generator: Controls how evaluation queries are sent

Benchmark Scenarios

Single-Stream: Measures latency (mobile/edge use cases)
Server: Measures throughput (production server use cases)
Offline: Measures batch processing (data center use cases)

Statistical Validation

Confidence intervals: Ensuring results are meaningful
Multiple runs: Accounting for variability
Significance testing: Proving improvements are real
Pitfall detection: Avoiding common benchmarking mistakes

🔧 What You'll Build

1. TinyTorchPerf Framework

from tinytorch.benchmarking import TinyTorchPerf

# Professional ML benchmarking
benchmark = TinyTorchPerf()
benchmark.set_model(your_model)
benchmark.set_dataset('cifar10')

# Run different scenarios
results = benchmark.run_all_scenarios()

2. Statistical Validator

# Ensure statistically valid results
validator = StatisticalValidator()
validation = validator.validate_results(results)
if validation.significant:
    print("✅ Improvement is statistically significant")

3. Performance Reporter

# Generate professional reports
reporter = PerformanceReporter()
report = reporter.generate_report(results)
report.save_as_html("my_capstone_results.html")

📈 Real-World Applications

Immediate Use Cases

ML projects: Systematic evaluation of your model implementations
Module integration: Validate that your TinyTorch components work together
Performance optimization: Prove your kernels actually improve performance

Career Applications

Research: Proper experimental methodology for papers
Industry: A/B testing and performance optimization
Open source: Contributing benchmarks to ML libraries

🎯 Success Metrics

By the end of this module, you should be able to:

Design a systematic benchmark for any ML system
Apply MLPerf principles to your own evaluations
Generate statistically valid performance comparisons
Create professional reports suitable for presentations
Identify and avoid common benchmarking pitfalls

🔄 Connection to Module 13 (MLOps)

Benchmarking → Production Monitoring

Benchmarking establishes baselines for production systems
Monitoring detects when production deviates from benchmarks
Both use similar metrics and statistical validation

📚 Resources

🎉 Ready to become a systematic ML evaluator? Let's build professional benchmarking skills!