mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 06:37:30 -05:00
✅ Renamed modules for clearer pedagogical flow: - 05_networks → 05_dense (multi-layer dense/fully connected networks) - 06_cnn → 06_spatial (convolutional networks for spatial patterns) - 06_attention → 07_attention (attention mechanisms for sequences) ✅ Shifted remaining modules down by 1: - 07_dataloader → 08_dataloader - 08_autograd → 09_autograd - 09_optimizers → 10_optimizers - 10_training → 11_training - 11_compression → 12_compression - 12_kernels → 13_kernels - 13_benchmarking → 14_benchmarking - 14_mlops → 15_mlops - 15_capstone → 16_capstone ✅ Updated module metadata (module.yaml files): - Updated names, descriptions, dependencies - Fixed prerequisite chains and enables relationships - Updated export paths to match new names New learner progression: Foundation → Individual Layers → Dense Networks → Spatial Networks → Attention Networks → Training Pipeline Perfect pedagogical flow: Build one layer → Stack dense layers → Add spatial patterns → Add attention mechanisms → Learn to train them all.
278 lines
11 KiB
Markdown
278 lines
11 KiB
Markdown
# 🔥 Module: Benchmarking
|
|
|
|
## 📊 Module Info
|
|
- **Difficulty**: ⭐⭐⭐⭐ Advanced
|
|
- **Time Estimate**: 6-8 hours
|
|
- **Prerequisites**: All previous modules (01-12), especially Kernels
|
|
- **Next Steps**: MLOps module (14)
|
|
|
|
Learn to systematically evaluate ML systems using industry-standard benchmarking methodology. This module teaches you to measure performance reliably, validate optimization claims, and create professional evaluation reports that meet research and industry standards.
|
|
|
|
## 🎯 Learning Objectives
|
|
|
|
By the end of this module, you will be able to:
|
|
|
|
- **Design systematic benchmarking experiments**: Apply MLPerf-inspired methodology to evaluate ML system performance
|
|
- **Implement statistical validation**: Ensure benchmark results are statistically significant and reproducible
|
|
- **Create professional performance reports**: Generate industry-standard documentation for optimization claims
|
|
- **Apply evaluation methodology**: Systematically compare models, optimizations, and architectural choices
|
|
- **Debug performance systematically**: Use benchmarking to identify bottlenecks and validate improvements
|
|
|
|
## 🧠 Build → Use → Analyze
|
|
|
|
This module follows TinyTorch's **Build → Use → Analyze** framework:
|
|
|
|
1. **Build**: Implement comprehensive benchmarking framework with MLPerf-inspired architecture and statistical validation
|
|
2. **Use**: Apply systematic evaluation to TinyTorch models, optimizations, and performance claims
|
|
3. **Analyze**: Generate professional reports, validate optimization effectiveness, and prepare results for presentations
|
|
|
|
## 📚 What You'll Build
|
|
|
|
### MLPerf-Inspired Benchmarking Framework
|
|
```python
|
|
# Professional ML system evaluation
|
|
from tinytorch.core.benchmarking import TinyTorchPerf, StatisticalValidator
|
|
|
|
# Configure benchmark system
|
|
benchmark = TinyTorchPerf()
|
|
benchmark.set_model(your_trained_model)
|
|
benchmark.set_dataset('cifar10', subset_size=1000)
|
|
benchmark.set_metrics(['latency', 'throughput', 'accuracy'])
|
|
|
|
# Run comprehensive evaluation
|
|
results = benchmark.run_all_scenarios([
|
|
'single_stream', # Latency-focused (mobile/edge)
|
|
'server', # Throughput-focused (production)
|
|
'offline' # Batch processing (data center)
|
|
])
|
|
|
|
print(f"Single-stream latency: {results['single_stream']['latency']:.2f}ms")
|
|
print(f"Server throughput: {results['server']['throughput']:.0f} samples/sec")
|
|
print(f"Offline batch time: {results['offline']['batch_time']:.2f}s")
|
|
```
|
|
|
|
### Statistical Validation System
|
|
```python
|
|
# Ensure statistically valid results
|
|
validator = StatisticalValidator(confidence_level=0.95, min_runs=30)
|
|
|
|
# Compare two models with statistical rigor
|
|
baseline_model = load_model("baseline_v1")
|
|
optimized_model = load_model("optimized_v2")
|
|
|
|
comparison = validator.compare_models(
|
|
baseline_model,
|
|
optimized_model,
|
|
test_dataset,
|
|
metrics=['latency', 'accuracy']
|
|
)
|
|
|
|
if comparison['latency']['significant']:
|
|
speedup = comparison['latency']['improvement']
|
|
confidence = comparison['latency']['confidence_interval']
|
|
print(f"✅ Speedup: {speedup:.2f}x (95% CI: {confidence[0]:.2f}-{confidence[1]:.2f})")
|
|
else:
|
|
print("❌ Performance difference not statistically significant")
|
|
```
|
|
|
|
### Comprehensive Performance Reporter
|
|
```python
|
|
# Generate professional evaluation reports
|
|
from tinytorch.core.benchmarking import PerformanceReporter
|
|
|
|
reporter = PerformanceReporter()
|
|
report = reporter.generate_comprehensive_report({
|
|
'models': [baseline_model, optimized_model, compressed_model],
|
|
'datasets': ['cifar10', 'imagenet_subset'],
|
|
'scenarios': ['mobile', 'server', 'edge'],
|
|
'optimizations': ['baseline', 'quantized', 'pruned', 'kernels']
|
|
})
|
|
|
|
# Export professional documentation
|
|
report.save_as_html("performance_evaluation.html")
|
|
report.save_as_pdf("performance_evaluation.pdf")
|
|
report.save_summary_table("results_summary.csv")
|
|
|
|
# Generate presentation slides
|
|
report.create_presentation_slides("optimization_results.pptx")
|
|
```
|
|
|
|
### Real-World Evaluation Scenarios
|
|
```python
|
|
# Mobile deployment evaluation
|
|
mobile_benchmark = TinyTorchPerf()
|
|
mobile_benchmark.configure_mobile_scenario(
|
|
max_latency_ms=100,
|
|
battery_constraints=True,
|
|
memory_limit_mb=50
|
|
)
|
|
|
|
mobile_results = mobile_benchmark.evaluate_model(compressed_model)
|
|
mobile_feasible = mobile_results['meets_constraints']
|
|
|
|
# Production server evaluation
|
|
server_benchmark = TinyTorchPerf()
|
|
server_benchmark.configure_server_scenario(
|
|
target_throughput=1000, # requests/second
|
|
max_latency_p99=50, # 99th percentile latency
|
|
concurrent_users=100
|
|
)
|
|
|
|
server_results = server_benchmark.evaluate_model(optimized_model)
|
|
production_ready = server_results['meets_sla']
|
|
```
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
Ensure you have built the complete TinyTorch system:
|
|
|
|
```bash
|
|
# Activate TinyTorch environment
|
|
source bin/activate-tinytorch.sh
|
|
|
|
# Verify prerequisite modules (comprehensive system needed)
|
|
tito test --module kernels # Performance optimization
|
|
tito test --module compression # Model optimization
|
|
tito test --module training # End-to-end training
|
|
```
|
|
|
|
### Development Workflow
|
|
1. **Open the development file**: `modules/source/13_benchmarking/benchmarking_dev.py`
|
|
2. **Implement benchmarking framework**: Build MLPerf-inspired evaluation system
|
|
3. **Add statistical validation**: Ensure reproducible and significant results
|
|
4. **Create performance reporters**: Generate professional documentation
|
|
5. **Test evaluation scenarios**: Apply to real models and optimization claims
|
|
6. **Export and verify**: `tito export --module benchmarking && tito test --module benchmarking`
|
|
|
|
## 🧪 Testing Your Implementation
|
|
|
|
### Comprehensive Test Suite
|
|
Run the full test suite to verify benchmarking system functionality:
|
|
|
|
```bash
|
|
# TinyTorch CLI (recommended)
|
|
tito test --module benchmarking
|
|
|
|
# Direct pytest execution
|
|
python -m pytest tests/ -k benchmarking -v
|
|
```
|
|
|
|
### Test Coverage Areas
|
|
- ✅ **Benchmarking Framework**: Verify MLPerf-inspired evaluation system works correctly
|
|
- ✅ **Statistical Validation**: Test confidence intervals, significance testing, and reproducibility
|
|
- ✅ **Performance Reporting**: Ensure professional report generation and data visualization
|
|
- ✅ **Scenario Testing**: Validate mobile, server, and offline evaluation scenarios
|
|
- ✅ **Integration Testing**: Test with real TinyTorch models and optimizations
|
|
|
|
### Inline Testing & Evaluation Validation
|
|
The module includes comprehensive benchmarking validation and methodology verification:
|
|
```python
|
|
# Example inline test output
|
|
🔬 Unit Test: MLPerf-inspired benchmark framework...
|
|
✅ Single-stream scenario working correctly
|
|
✅ Server scenario measures throughput accurately
|
|
✅ Offline scenario handles batch processing
|
|
📈 Progress: Benchmarking Framework ✓
|
|
|
|
# Statistical validation testing
|
|
🔬 Unit Test: Statistical significance testing...
|
|
✅ Confidence intervals computed correctly
|
|
✅ Multiple comparison correction applied
|
|
✅ Minimum sample size requirements enforced
|
|
📈 Progress: Statistical Validation ✓
|
|
|
|
# Report generation testing
|
|
🔬 Unit Test: Performance report generation...
|
|
✅ HTML reports generated with proper formatting
|
|
✅ Summary tables include all required metrics
|
|
✅ Visualization charts display correctly
|
|
📈 Progress: Professional Reporting ✓
|
|
```
|
|
|
|
### Manual Testing Examples
|
|
```python
|
|
from benchmarking_dev import TinyTorchPerf, StatisticalValidator
|
|
from networks_dev import Sequential
|
|
from layers_dev import Dense
|
|
from activations_dev import ReLU
|
|
|
|
# Create test models
|
|
baseline_model = Sequential([Dense(784, 128), ReLU(), Dense(128, 10)])
|
|
optimized_model = compress_model(baseline_model, compression_ratio=0.5)
|
|
|
|
# Set up benchmarking
|
|
benchmark = TinyTorchPerf()
|
|
benchmark.set_dataset('synthetic', size=1000, input_shape=(784,), num_classes=10)
|
|
|
|
# Run evaluation
|
|
baseline_results = benchmark.evaluate_model(baseline_model)
|
|
optimized_results = benchmark.evaluate_model(optimized_model)
|
|
|
|
print(f"Baseline latency: {baseline_results['latency']:.2f}ms")
|
|
print(f"Optimized latency: {optimized_results['latency']:.2f}ms")
|
|
print(f"Speedup: {baseline_results['latency']/optimized_results['latency']:.2f}x")
|
|
|
|
# Statistical validation
|
|
validator = StatisticalValidator()
|
|
comparison = validator.compare_models(baseline_model, optimized_model, test_data)
|
|
print(f"Statistically significant: {comparison['significant']}")
|
|
```
|
|
|
|
## 🎯 Key Concepts
|
|
|
|
### Real-World Applications
|
|
- **MLPerf Benchmarks**: Industry-standard evaluation methodology for ML systems and hardware
|
|
- **Production A/B Testing**: Statistical validation of model improvements in live systems
|
|
- **Research Paper Evaluation**: Rigorous experimental methodology for academic publication
|
|
- **Hardware Evaluation**: Systematic comparison of ML accelerators and deployment platforms
|
|
|
|
### Evaluation Methodology
|
|
- **Systematic Experimentation**: Controlled variables, multiple runs, and statistical validation
|
|
- **Scenario-Based Testing**: Mobile, server, and edge deployment evaluation patterns
|
|
- **Performance Metrics**: Latency, throughput, accuracy, memory usage, and energy consumption
|
|
- **Statistical Rigor**: Confidence intervals, significance testing, and reproducibility requirements
|
|
|
|
### Professional Reporting
|
|
- **Industry Standards**: MLPerf-style reporting with comprehensive metrics and statistical validation
|
|
- **Visual Communication**: Charts, tables, and graphs that clearly communicate performance results
|
|
- **Executive Summaries**: High-level findings suitable for technical and business stakeholders
|
|
- **Reproducibility**: Complete methodology documentation for result verification
|
|
|
|
### Benchmarking Best Practices
|
|
- **Baseline Establishment**: Proper reference points for meaningful comparisons
|
|
- **Environment Control**: Consistent hardware, software, and data conditions
|
|
- **Statistical Power**: Sufficient sample sizes for reliable conclusions
|
|
- **Bias Avoidance**: Careful experimental design to prevent misleading results
|
|
|
|
## 🎉 Ready to Build?
|
|
|
|
You're about to master the evaluation methodology that separates rigorous engineering from wishful thinking! This module teaches you to validate claims, measure improvements systematically, and communicate results professionally.
|
|
|
|
Every major breakthrough in ML—from ImageNet winners to production systems—depends on systematic evaluation like what you're building. You'll learn to think like a performance scientist, ensuring your optimizations actually work and proving it with statistical rigor. Take your time, be thorough, and enjoy building the foundation of evidence-based ML engineering!
|
|
|
|
```{grid} 3
|
|
:gutter: 3
|
|
:margin: 2
|
|
|
|
{grid-item-card} 🚀 Launch Builder
|
|
:link: https://mybinder.org/v2/gh/VJProductions/TinyTorch/main?filepath=modules/source/13_benchmarking/benchmarking_dev.py
|
|
:class-title: text-center
|
|
:class-body: text-center
|
|
|
|
Interactive development environment
|
|
|
|
{grid-item-card} 📓 Open in Colab
|
|
:link: https://colab.research.google.com/github/VJProductions/TinyTorch/blob/main/modules/source/13_benchmarking/benchmarking_dev.ipynb
|
|
:class-title: text-center
|
|
:class-body: text-center
|
|
|
|
Google Colab notebook
|
|
|
|
{grid-item-card} 👀 View Source
|
|
:link: https://github.com/VJProductions/TinyTorch/blob/main/modules/source/13_benchmarking/benchmarking_dev.py
|
|
:class-title: text-center
|
|
:class-body: text-center
|
|
|
|
Browse the code on GitHub
|
|
``` |