Standardize all modules to follow NBGrader style guide

- Updated 7 non-compliant modules for consistency
- Module 01_setup: Added EXAMPLE USAGE sections with code examples
- Module 02_tensor: Added STEP-BY-STEP IMPLEMENTATION and LEARNING CONNECTIONS
- Module 05_dense: Added LEARNING CONNECTIONS to all functions
- Module 06_spatial: Added STEP-BY-STEP and LEARNING CONNECTIONS
- Module 08_dataloader: Added LEARNING CONNECTIONS sections
- Module 11_training: Added STEP-BY-STEP and LEARNING CONNECTIONS
- Module 14_benchmarking: Added STEP-BY-STEP and LEARNING CONNECTIONS
- All modules now follow consistent format per NBGRADER_STYLE_GUIDE.md
- Preserved all existing solution blocks and functionality
This commit is contained in:
Vijay Janapa Reddi
2025-09-16 16:48:14 -04:00
parent 0a0197b72c
commit 6349c218d2
7 changed files with 402 additions and 39 deletions

View File

@@ -249,7 +249,7 @@ class BenchmarkScenarios:
TODO: Implement the three benchmark scenarios following MLPerf patterns.
UNDERSTANDING THE SCENARIOS:
STEP-BY-STEP IMPLEMENTATION:
1. Single-Stream: Send queries one at a time, measure latency
2. Server: Send queries following Poisson distribution, measure QPS
3. Offline: Send all queries at once, measure total throughput
@@ -260,6 +260,12 @@ class BenchmarkScenarios:
3. Calculate appropriate metrics for each scenario
4. Return BenchmarkResult with all measurements
LEARNING CONNECTIONS:
- **MLPerf Standards**: Industry-standard benchmarking methodology used by Google, NVIDIA, etc.
- **Performance Scenarios**: Different deployment patterns require different measurement approaches
- **Production Validation**: Benchmarking validates model performance before deployment
- **Resource Planning**: Results guide infrastructure scaling and capacity planning
EXAMPLE USAGE:
scenarios = BenchmarkScenarios()
result = scenarios.single_stream(model, dataset, num_queries=1000)
@@ -275,7 +281,7 @@ class BenchmarkScenarios:
TODO: Implement single-stream benchmarking.
STEP-BY-STEP:
STEP-BY-STEP IMPLEMENTATION:
1. Initialize empty list for latencies
2. For each query (up to num_queries):
a. Get next sample from dataset (cycle if needed)
@@ -288,6 +294,12 @@ class BenchmarkScenarios:
4. Calculate accuracy if possible
5. Return BenchmarkResult with SINGLE_STREAM scenario
LEARNING CONNECTIONS:
- **Mobile/Edge Deployment**: Single-stream simulates user-facing applications
- **Tail Latency**: 90th/95th percentiles matter more than averages for user experience
- **Interactive Systems**: Chatbots, recommendation engines use single-stream patterns
- **SLA Validation**: Ensures models meet response time requirements
HINTS:
- Use time.perf_counter() for precise timing
- Use dataset[i % len(dataset)] to cycle through samples
@@ -337,7 +349,7 @@ class BenchmarkScenarios:
TODO: Implement server benchmarking.
STEP-BY-STEP:
STEP-BY-STEP IMPLEMENTATION:
1. Calculate inter-arrival time = 1.0 / target_qps
2. Run for specified duration:
a. Wait for next query arrival (Poisson distribution)
@@ -348,6 +360,12 @@ class BenchmarkScenarios:
3. Calculate actual QPS = total_queries / duration
4. Return results
LEARNING CONNECTIONS:
- **Web Services**: Server scenario simulates API endpoints handling concurrent requests
- **Load Testing**: Validates system behavior under realistic traffic patterns
- **Scalability Analysis**: Tests how well models handle increasing load
- **Production Deployment**: Critical for microservices and web-scale applications
HINTS:
- Use np.random.exponential(inter_arrival_time) for Poisson
- Track both query arrival times and completion times
@@ -400,7 +418,7 @@ class BenchmarkScenarios:
TODO: Implement offline benchmarking.
STEP-BY-STEP:
STEP-BY-STEP IMPLEMENTATION:
1. Group dataset into batches of batch_size
2. For each batch:
a. Record start time
@@ -410,6 +428,12 @@ class BenchmarkScenarios:
3. Calculate total throughput = total_samples / total_time
4. Return results
LEARNING CONNECTIONS:
- **Batch Processing**: Offline scenario simulates data pipeline and ETL workloads
- **Throughput Optimization**: Maximizes processing efficiency for large datasets
- **Data Center Workloads**: Common in recommendation systems and analytics pipelines
- **Cost Optimization**: High throughput reduces compute costs per sample
HINTS:
- Process data in batches for efficiency
- Measure total time for all batches
@@ -521,7 +545,7 @@ class StatisticalValidator:
TODO: Implement statistical validation for benchmark results.
UNDERSTANDING STATISTICAL TESTING:
STEP-BY-STEP IMPLEMENTATION:
1. Null hypothesis: No difference between models
2. T-test: Compare means of two groups
3. P-value: Probability of seeing this difference by chance
@@ -534,6 +558,12 @@ class StatisticalValidator:
3. Calculate effect size (Cohen's d)
4. Calculate confidence interval
5. Provide clear recommendation
LEARNING CONNECTIONS:
- **Scientific Rigor**: Ensures performance claims are statistically valid
- **A/B Testing**: Foundation for production model comparison and rollout decisions
- **Research Validation**: Required for academic papers and technical reports
- **Business Decisions**: Statistical significance guides investment in new models
"""
def __init__(self, confidence_level: float = 0.95):
@@ -733,7 +763,7 @@ class TinyTorchPerf:
TODO: Implement the complete benchmarking framework.
UNDERSTANDING THE FRAMEWORK:
STEP-BY-STEP IMPLEMENTATION:
1. Combines all benchmark scenarios
2. Integrates statistical validation
3. Provides easy-to-use API
@@ -744,6 +774,12 @@ class TinyTorchPerf:
2. Provide methods for each scenario
3. Include statistical validation
4. Generate comprehensive reports
LEARNING CONNECTIONS:
- **MLPerf Integration**: Follows industry-standard benchmarking patterns
- **Production Deployment**: Validates models before production rollout
- **Performance Engineering**: Identifies bottlenecks and optimization opportunities
- **Framework Design**: Demonstrates how to build reusable ML tools
"""
def __init__(self):
@@ -1376,13 +1412,19 @@ class ProductionBenchmarkingProfiler:
TODO: Implement production-grade profiling capabilities.
UNDERSTANDING PRODUCTION PROFILING:
STEP-BY-STEP IMPLEMENTATION:
1. End-to-end pipeline analysis (not just model inference)
2. Resource utilization monitoring (CPU, memory, bandwidth)
3. Statistical A/B testing frameworks
4. Production monitoring and alerting integration
5. Performance regression detection
6. Load testing and capacity planning
LEARNING CONNECTIONS:
- **Production ML Systems**: Real-world profiling for deployment optimization
- **Performance Engineering**: Systematic approach to identifying and fixing bottlenecks
- **A/B Testing**: Statistical frameworks for safe model rollouts
- **Cost Optimization**: Understanding resource usage for efficient cloud deployment
"""
def __init__(self, enable_monitoring: bool = True):