Standardize all modules to follow NBGrader style guide

- Updated 7 non-compliant modules for consistency
- Module 01_setup: Added EXAMPLE USAGE sections with code examples
- Module 02_tensor: Added STEP-BY-STEP IMPLEMENTATION and LEARNING CONNECTIONS
- Module 05_dense: Added LEARNING CONNECTIONS to all functions
- Module 06_spatial: Added STEP-BY-STEP and LEARNING CONNECTIONS
- Module 08_dataloader: Added LEARNING CONNECTIONS sections
- Module 11_training: Added STEP-BY-STEP and LEARNING CONNECTIONS
- Module 14_benchmarking: Added STEP-BY-STEP and LEARNING CONNECTIONS
- All modules now follow consistent format per NBGRADER_STYLE_GUIDE.md
- Preserved all existing solution blocks and functionality
This commit is contained in:
Vijay Janapa Reddi
2025-09-16 16:48:14 -04:00
parent 0a0197b72c
commit 6349c218d2
7 changed files with 402 additions and 39 deletions

View File

@@ -267,14 +267,22 @@ def personal_info() -> Dict[str, str]:
4. Make system_name unique and descriptive
5. Keep version as '1.0.0' for now
EXAMPLE OUTPUT:
{
'developer': 'Student Name',
'email': 'student@university.edu',
'institution': 'University Name',
'system_name': 'StudentName-TinyTorch-Dev',
'version': '1.0.0'
}
EXAMPLE USAGE:
```python
# Get your personal configuration
info = personal_info()
print(info['developer']) # Expected: "Your Name" (not placeholder)
print(info['email']) # Expected: "you@domain.com" (valid email)
print(info['system_name']) # Expected: "YourName-Dev" (unique identifier)
print(info) # Expected: Complete dict with 5 fields
# Output: {
# 'developer': 'Your Name',
# 'email': 'you@domain.com',
# 'institution': 'Your Institution',
# 'system_name': 'YourName-TinyTorch-Dev',
# 'version': '1.0.0'
# }
```
IMPLEMENTATION HINTS:
- Replace the example with your real information
@@ -486,14 +494,25 @@ def system_info() -> Dict[str, Any]:
6. Convert memory from bytes to GB (divide by 1024^3)
7. Return all information in a dictionary
EXAMPLE OUTPUT:
{
'python_version': '3.9.7',
'platform': 'Darwin',
'architecture': 'arm64',
'cpu_count': 8,
'memory_gb': 16.0
}
EXAMPLE USAGE:
```python
# Query system information
sys_info = system_info()
print(f"Python: {sys_info['python_version']}") # Expected: "3.x.x"
print(f"Platform: {sys_info['platform']}") # Expected: "Darwin"/"Linux"/"Windows"
print(f"CPUs: {sys_info['cpu_count']}") # Expected: 4, 8, 16, etc.
print(f"Memory: {sys_info['memory_gb']} GB") # Expected: 8.0, 16.0, 32.0, etc.
# Full output example:
print(sys_info)
# Expected: {
# 'python_version': '3.9.7',
# 'platform': 'Darwin',
# 'architecture': 'arm64',
# 'cpu_count': 8,
# 'memory_gb': 16.0
# }
```
IMPLEMENTATION HINTS:
- Use f-string formatting for Python version: f"{major}.{minor}.{micro}"

View File

@@ -341,6 +341,18 @@ class Tensor:
TODO: Return the stored numpy array.
STEP-BY-STEP IMPLEMENTATION:
1. Access the internal _data attribute
2. Return the numpy array directly
3. This provides access to underlying data for NumPy operations
LEARNING CONNECTIONS:
Real-world relevance:
- PyTorch: tensor.numpy() converts to NumPy for visualization/analysis
- TensorFlow: tensor.numpy() enables integration with scientific Python
- Production: Data scientists need to access raw arrays for debugging
- Performance: Direct access avoids copying for read-only operations
HINT: Return self._data (the array you stored in __init__)
"""
### BEGIN SOLUTION
@@ -354,6 +366,18 @@ class Tensor:
TODO: Return the shape of the stored numpy array.
STEP-BY-STEP IMPLEMENTATION:
1. Access the _data attribute (the NumPy array)
2. Get the shape property from the NumPy array
3. Return the shape tuple directly
LEARNING CONNECTIONS:
Real-world relevance:
- Neural networks: Layer compatibility requires matching shapes
- Computer vision: Image shape (height, width, channels) determines architecture
- NLP: Sequence length and vocabulary size affect model design
- Debugging: Shape mismatches are the #1 cause of ML errors
HINT: Use .shape attribute of the numpy array
EXAMPLE: Tensor([1, 2, 3]).shape should return (3,)
"""
@@ -368,6 +392,18 @@ class Tensor:
TODO: Return the total number of elements in the tensor.
STEP-BY-STEP IMPLEMENTATION:
1. Access the _data attribute (the NumPy array)
2. Get the size property from the NumPy array
3. Return the total element count as an integer
LEARNING CONNECTIONS:
Real-world relevance:
- Memory planning: Calculate RAM requirements for large tensors
- Model architecture: Determine parameter counts for layers
- Performance optimization: Size affects computation time
- Batch processing: Total elements determines vectorization efficiency
HINT: Use .size attribute of the numpy array
EXAMPLE: Tensor([1, 2, 3]).size should return 3
"""
@@ -382,6 +418,18 @@ class Tensor:
TODO: Return the data type of the stored numpy array.
STEP-BY-STEP IMPLEMENTATION:
1. Access the _data attribute (the NumPy array)
2. Get the dtype property from the NumPy array
3. Return the NumPy dtype object directly
LEARNING CONNECTIONS:
Real-world relevance:
- Precision vs speed: float32 is faster, float64 more accurate
- Memory optimization: int8 uses 1/4 memory of int32
- GPU compatibility: Some operations only work with specific types
- Model deployment: Mobile/edge devices prefer smaller data types
HINT: Use .dtype attribute of the numpy array
EXAMPLE: Tensor([1, 2, 3]).dtype should return dtype('int32')
"""
@@ -395,6 +443,19 @@ class Tensor:
TODO: Create a clear string representation of the tensor.
STEP-BY-STEP IMPLEMENTATION:
1. Convert the numpy array to a list using .tolist()
2. Get shape and dtype information from properties
3. Format as "Tensor([data], shape=shape, dtype=dtype)"
4. Return the formatted string
LEARNING CONNECTIONS:
Real-world relevance:
- Debugging: Clear tensor representation speeds debugging
- Jupyter notebooks: Good __repr__ improves data exploration
- Logging: Production systems log tensor info for monitoring
- Education: Students understand tensors better with clear output
APPROACH:
1. Convert the numpy array to a list for readable output
2. Include the shape and dtype information
@@ -418,6 +479,19 @@ class Tensor:
TODO: Implement tensor addition.
STEP-BY-STEP IMPLEMENTATION:
1. Extract numpy arrays from both tensors
2. Use NumPy's + operator for element-wise addition
3. Create a new Tensor object with the result
4. Return the new tensor
LEARNING CONNECTIONS:
Real-world relevance:
- Neural networks: Adding bias terms to linear layer outputs
- Residual connections: skip connections in ResNet architectures
- Gradient updates: Adding computed gradients to parameters
- Ensemble methods: Combining predictions from multiple models
APPROACH:
1. Add the numpy arrays using +
2. Return a new Tensor with the result
@@ -442,6 +516,19 @@ class Tensor:
TODO: Implement tensor multiplication.
STEP-BY-STEP IMPLEMENTATION:
1. Extract numpy arrays from both tensors
2. Use NumPy's * operator for element-wise multiplication
3. Create a new Tensor object with the result
4. Return the new tensor
LEARNING CONNECTIONS:
Real-world relevance:
- Activation functions: Element-wise operations like ReLU masking
- Attention mechanisms: Element-wise scaling in transformer models
- Feature scaling: Multiplying features by learned scaling factors
- Gating: Element-wise gating in LSTM and GRU cells
APPROACH:
1. Multiply the numpy arrays using *
2. Return a new Tensor with the result
@@ -466,6 +553,19 @@ class Tensor:
TODO: Implement + operator for tensors.
STEP-BY-STEP IMPLEMENTATION:
1. Check if other is a Tensor object
2. If Tensor, call the add() method directly
3. If scalar, convert to Tensor then call add()
4. Return the result from add() method
LEARNING CONNECTIONS:
Real-world relevance:
- Natural syntax: tensor + scalar enables intuitive code
- Broadcasting: Adding scalars to tensors is common in ML
- Operator overloading: Python's magic methods enable math-like syntax
- API design: Clean interfaces reduce cognitive load for researchers
APPROACH:
1. If other is a Tensor, use tensor addition
2. If other is a scalar, convert to Tensor first
@@ -488,6 +588,19 @@ class Tensor:
TODO: Implement * operator for tensors.
STEP-BY-STEP IMPLEMENTATION:
1. Check if other is a Tensor object
2. If Tensor, call the multiply() method directly
3. If scalar, convert to Tensor then call multiply()
4. Return the result from multiply() method
LEARNING CONNECTIONS:
Real-world relevance:
- Scaling features: tensor * learning_rate for gradient updates
- Masking: tensor * mask for attention mechanisms
- Regularization: tensor * dropout_mask during training
- Normalization: tensor * scale_factor in batch normalization
APPROACH:
1. If other is a Tensor, use tensor multiplication
2. If other is a scalar, convert to Tensor first
@@ -510,6 +623,19 @@ class Tensor:
TODO: Implement - operator for tensors.
STEP-BY-STEP IMPLEMENTATION:
1. Check if other is a Tensor object
2. If Tensor, subtract other._data from self._data
3. If scalar, subtract scalar directly from self._data
4. Create new Tensor with result and return
LEARNING CONNECTIONS:
Real-world relevance:
- Gradient computation: parameter - learning_rate * gradient
- Residual connections: output - skip_connection in some architectures
- Error calculation: predicted - actual for loss computation
- Centering data: tensor - mean for zero-centered inputs
APPROACH:
1. Convert other to Tensor if needed
2. Subtract using numpy arrays
@@ -533,6 +659,19 @@ class Tensor:
TODO: Implement / operator for tensors.
STEP-BY-STEP IMPLEMENTATION:
1. Check if other is a Tensor object
2. If Tensor, divide self._data by other._data
3. If scalar, divide self._data by scalar directly
4. Create new Tensor with result and return
LEARNING CONNECTIONS:
Real-world relevance:
- Normalization: tensor / std_deviation for standard scaling
- Learning rate decay: parameter / decay_factor over time
- Probability computation: counts / total_counts for frequencies
- Temperature scaling: logits / temperature in softmax functions
APPROACH:
1. Convert other to Tensor if needed
2. Divide using numpy arrays
@@ -560,6 +699,19 @@ class Tensor:
TODO: Implement matrix multiplication.
STEP-BY-STEP IMPLEMENTATION:
1. Extract numpy arrays from both tensors
2. Use np.matmul() for proper matrix multiplication
3. Create new Tensor object with the result
4. Return the new tensor
LEARNING CONNECTIONS:
Real-world relevance:
- Linear layers: input @ weight matrices in neural networks
- Transformer attention: Q @ K^T for attention scores
- CNN convolutions: Implemented as matrix multiplications
- Batch processing: Matrix ops enable parallel computation
APPROACH:
1. Use np.matmul() to perform matrix multiplication
2. Return a new Tensor with the result

View File

@@ -206,6 +206,12 @@ class Sequential:
HINTS:
- Use self.layers to store the layers
- Handle empty initialization case
LEARNING CONNECTIONS:
- This is equivalent to torch.nn.Sequential in PyTorch
- Used in every neural network to chain layers together
- Foundation for models like VGG, ResNet, and transformers
- Enables modular network design and experimentation
"""
### BEGIN SOLUTION
self.layers = layers if layers is not None else []
@@ -241,6 +247,12 @@ class Sequential:
- Apply each layer: x = layer(x)
- The output of one layer becomes input to the next
- Return the final result
LEARNING CONNECTIONS:
- This is the core of feedforward neural networks
- Powers inference in every deployed model
- Critical for real-time predictions in production
- Foundation for gradient flow in backpropagation
"""
### BEGIN SOLUTION
# Apply each layer in sequence
@@ -394,6 +406,12 @@ def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int,
- For each hidden_size: add Dense(current_size, hidden_size), then activation
- Finally add Dense(last_hidden_size, output_size), then output_activation
- Return Sequential(layers)
LEARNING CONNECTIONS:
- This pattern is used in every feedforward network implementation
- Foundation for architectures like autoencoders and GANs
- Enables rapid prototyping of neural architectures
- Similar to tf.keras.Sequential with Dense layers
"""
layers = []
current_size = input_size
@@ -1031,6 +1049,12 @@ class NetworkStabilityMonitor:
- Check: np.any(np.isinf(tensor.data))
- Check: np.any(np.abs(tensor.data) > self.warning_threshold)
- Return dict with analysis
LEARNING CONNECTIONS:
- Critical for debugging exploding/vanishing gradients
- Used in production monitoring systems at scale
- Foundation for automated model health checks
- Similar to TensorBoard's histogram monitoring
"""
### BEGIN SOLUTION
data = tensor.data
@@ -1111,6 +1135,12 @@ class NetworkStabilityMonitor:
- Simple loss: 0.5 * np.sum((output.data - target_output.data)**2)
- Use small perturbations to estimate gradients
- Vanishing: gradients < 1e-6, Exploding: gradients > 1e3
LEARNING CONNECTIONS:
- Essential for training deep networks successfully
- Used in gradient clipping and batch normalization design
- Foundation for understanding network initialization strategies
- Similar to PyTorch's gradient debugging tools
"""
### BEGIN SOLUTION
# Forward pass

View File

@@ -145,7 +145,7 @@ def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
TODO: Implement the sliding window convolution using for-loops.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Get input dimensions: H, W = input.shape
2. Get kernel dimensions: kH, kW = kernel.shape
3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1
@@ -157,6 +157,12 @@ def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
- dj loop: kernel columns (0 to kW-1)
6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]
LEARNING CONNECTIONS:
- **Computer Vision Foundation**: Convolution is the core operation in CNNs and image processing
- **Feature Detection**: Different kernels detect edges, textures, and patterns in images
- **Spatial Hierarchies**: Convolution preserves spatial relationships while extracting features
- **Production CNNs**: Understanding the basic operation helps optimize GPU implementations
EXAMPLE:
Input: [[1, 2, 3], Kernel: [[1, 0],
[4, 5, 6], [0, -1]]
@@ -467,10 +473,16 @@ def flatten(x):
TODO: Implement flattening operation.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Get the numpy array from the tensor
2. Use .flatten() to convert to 1D
3. Add batch dimension with [None, :]
LEARNING CONNECTIONS:
- **CNN to MLP Transition**: Flattening connects convolutional and dense layers
- **Spatial to Vector**: Converts 2D feature maps to vectors for classification
- **Memory Layout**: Understanding how tensors are stored and reshaped in memory
- **Framework Design**: All major frameworks (PyTorch, TensorFlow) use similar patterns
4. Return Tensor wrapped around the result
EXAMPLE:
@@ -955,6 +967,18 @@ class ConvolutionProfiler:
TODO: Implement convolution operation profiling.
STEP-BY-STEP IMPLEMENTATION:
1. Profile different kernel sizes and their computational costs
2. Measure memory usage patterns for spatial operations
3. Analyze cache efficiency and memory access patterns
4. Identify optimization opportunities for production systems
LEARNING CONNECTIONS:
- **Performance Optimization**: Understanding computational costs of different kernel sizes
- **Memory Efficiency**: Cache-friendly access patterns improve performance significantly
- **Production Scaling**: Profiling guides hardware selection and deployment strategies
- **GPU Optimization**: Spatial operations are ideal for parallel processing
APPROACH:
1. Time convolution operations with different kernel sizes
2. Analyze memory usage patterns for spatial operations

View File

@@ -184,7 +184,7 @@ class Dataset:
TODO: Implement abstract method for getting samples.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. This is an abstract method - subclasses will implement it
2. Return a tuple of (data, label) tensors
3. Data should be the input features, label should be the target
@@ -192,6 +192,12 @@ class Dataset:
EXAMPLE:
dataset[0] should return (Tensor(image_data), Tensor(label))
LEARNING CONNECTIONS:
- **PyTorch Integration**: This follows the exact same pattern as torch.utils.data.Dataset
- **Production Data**: Real datasets like ImageNet, CIFAR-10 use this interface
- **Memory Efficiency**: On-demand loading prevents loading entire dataset into memory
- **Batching Foundation**: DataLoader uses __getitem__ to create batches efficiently
HINTS:
- This is an abstract method that subclasses must override
- Always return a tuple of (data, label) tensors
@@ -208,13 +214,19 @@ class Dataset:
TODO: Implement abstract method for getting dataset size.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. This is an abstract method - subclasses will implement it
2. Return the total number of samples in the dataset
EXAMPLE:
len(dataset) should return 50000 for CIFAR-10 training set
LEARNING CONNECTIONS:
- **Memory Planning**: DataLoader uses len() to calculate number of batches
- **Progress Tracking**: Training loops use len() for progress bars and epoch calculations
- **Distributed Training**: Multi-GPU systems need dataset size for work distribution
- **Statistical Sampling**: Some training strategies require knowing total dataset size
HINTS:
- This is an abstract method that subclasses must override
- Return an integer representing the total number of samples
@@ -230,7 +242,7 @@ class Dataset:
TODO: Implement method to get sample shape.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Get the first sample using self[0]
2. Extract the data part (first element of tuple)
3. Return the shape of the data tensor
@@ -238,6 +250,12 @@ class Dataset:
EXAMPLE:
For CIFAR-10: returns (3, 32, 32) for RGB images
LEARNING CONNECTIONS:
- **Model Architecture**: Neural networks need to know input shape for first layer
- **Batch Planning**: Systems use sample shape to calculate memory requirements
- **Preprocessing Validation**: Ensures all samples have consistent shape
- **Framework Integration**: Similar to PyTorch's dataset shape inspection
HINTS:
- Use self[0] to get the first sample
- Extract data from the (data, label) tuple
@@ -255,13 +273,19 @@ class Dataset:
TODO: Implement abstract method for getting number of classes.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. This is an abstract method - subclasses will implement it
2. Return the number of unique classes in the dataset
EXAMPLE:
For CIFAR-10: returns 10 (classes 0-9)
LEARNING CONNECTIONS:
- **Output Layer Design**: Neural networks need num_classes for final layer size
- **Loss Function Setup**: CrossEntropyLoss uses num_classes for proper computation
- **Evaluation Metrics**: Accuracy calculation depends on number of classes
- **Model Validation**: Ensures model predictions match expected class range
HINTS:
- This is an abstract method that subclasses must override
- Return the number of unique classes/categories
@@ -432,7 +456,7 @@ class DataLoader:
TODO: Implement batching and shuffling logic.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Create indices list: list(range(len(dataset)))
2. Shuffle indices if self.shuffle is True
3. Loop through indices in batch_size chunks
@@ -443,6 +467,12 @@ class DataLoader:
# batch_data.shape: (batch_size, ...)
# batch_labels.shape: (batch_size,)
LEARNING CONNECTIONS:
- **GPU Efficiency**: Batching maximizes GPU utilization by processing multiple samples together
- **Training Stability**: Shuffling prevents overfitting to data order and improves generalization
- **Memory Management**: Batches fit in GPU memory while full dataset may not
- **Gradient Estimation**: Batch gradients provide better estimates than single-sample gradients
HINTS:
- Use list(range(len(self.dataset))) for indices
- Use np.random.shuffle() if self.shuffle is True
@@ -1172,6 +1202,12 @@ class DataPipelineProfiler:
print(f"Avg batch time: {timing['avg_batch_time']:.3f}s")
print(f"Bottleneck: {timing['is_bottleneck']}")
LEARNING CONNECTIONS:
- **Production Optimization**: Fast GPUs often wait for slow data loading
- **System Bottlenecks**: Data loading can limit training speed more than model complexity
- **Resource Planning**: Understanding I/O vs compute trade-offs for hardware selection
- **Pipeline Tuning**: Multi-worker data loading and prefetching strategies
HINTS:
- Use enumerate(dataloader) to get batches
- Time each batch: start = time.time(), batch = next(iter), end = time.time()
@@ -1245,6 +1281,12 @@ class DataPipelineProfiler:
analysis = profiler.analyze_batch_size_scaling(my_dataset, [16, 32, 64])
print(f"Optimal batch size: {analysis['optimal_batch_size']}")
LEARNING CONNECTIONS:
- **Memory vs Throughput**: Larger batches improve throughput but consume more memory
- **Hardware Optimization**: Optimal batch size depends on GPU memory and compute units
- **Training Dynamics**: Batch size affects gradient noise and convergence behavior
- **Production Scaling**: Understanding batch size impact on serving latency and cost
HINTS:
- Create DataLoader: DataLoader(dataset, batch_size=bs, shuffle=False)
- Time with self.time_dataloader_iteration()

View File

@@ -155,7 +155,7 @@ class MeanSquaredError:
TODO: Implement Mean SquaredError loss computation.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Compute difference: diff = y_pred - y_true
2. Square the differences: squared_diff = diff²
3. Take mean over all elements: mean(squared_diff)
@@ -168,6 +168,12 @@ class MeanSquaredError:
# Should return: mean([(1.0-1.5)², (2.0-2.5)², (3.0-2.5)², (4.0-3.5)²])
# = mean([0.25, 0.25, 0.25, 0.25]) = 0.25
LEARNING CONNECTIONS:
- **Regression Optimization**: MSE loss guides models toward accurate numerical predictions
- **Gradient Properties**: MSE provides smooth gradients proportional to prediction error
- **Outlier Sensitivity**: Squared errors heavily penalize large mistakes
- **Production Usage**: Common in recommendation systems, time series, and financial modeling
HINTS:
- Use tensor subtraction: y_pred - y_true
- Use tensor power: diff ** 2
@@ -261,7 +267,7 @@ class CrossEntropyLoss:
TODO: Implement Cross-Entropy loss computation.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Handle both class indices and one-hot encoded labels
2. Apply softmax to predictions for probability distribution
3. Compute log probabilities: log(softmax(y_pred))
@@ -274,6 +280,12 @@ class CrossEntropyLoss:
loss = crossentropy_loss(y_pred, y_true)
# Should apply softmax then compute -log(prob_of_correct_class)
LEARNING CONNECTIONS:
- **Classification Foundation**: CrossEntropy is the standard loss for multi-class problems
- **Probability Interpretation**: Measures difference between predicted and true distributions
- **Information Theory**: Based on entropy and KL divergence concepts
- **Production Systems**: Used in image classification, NLP, and recommendation systems
HINTS:
- Use softmax: exp(x) / sum(exp(x)) for probability distribution
- Add small epsilon (1e-15) to avoid log(0)
@@ -392,7 +404,7 @@ class BinaryCrossEntropyLoss:
TODO: Implement Binary Cross-Entropy loss computation.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Apply sigmoid to predictions for probability values
2. Clip probabilities to avoid log(0) and log(1)
3. Compute: -y_true * log(y_pred) - (1-y_true) * log(1-y_pred)
@@ -405,6 +417,12 @@ class BinaryCrossEntropyLoss:
loss = bce_loss(y_pred, y_true)
# Should apply sigmoid then compute binary cross-entropy
LEARNING CONNECTIONS:
- **Binary Classification**: Standard loss for yes/no, spam/ham, fraud detection
- **Sigmoid Output**: Maps any real number to probability range [0,1]
- **Medical Diagnosis**: Common in disease detection and medical screening
- **A/B Testing**: Used for conversion prediction and user behavior modeling
HINTS:
- Use sigmoid: 1 / (1 + exp(-x))
- Clip probabilities: np.clip(probs, epsilon, 1-epsilon)
@@ -577,7 +595,7 @@ class Accuracy:
TODO: Implement accuracy computation.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Convert predictions to class indices (argmax for multi-class)
2. Convert true labels to class indices if needed
3. Count correct predictions
@@ -590,6 +608,12 @@ class Accuracy:
accuracy = accuracy_metric(y_pred, y_true)
# Should return: 2/3 = 0.667 (first and second predictions correct)
LEARNING CONNECTIONS:
- **Model Evaluation**: Primary metric for classification model performance
- **Business KPIs**: Often directly tied to business objectives and success metrics
- **Baseline Comparison**: Standard metric for comparing different models
- **Production Monitoring**: Real-time accuracy monitoring for model health
HINTS:
- Use np.argmax(axis=1) for multi-class predictions
- Handle both probability and class index inputs
@@ -789,7 +813,7 @@ class Trainer:
TODO: Implement single epoch training logic.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Initialize epoch metrics tracking
2. Iterate through batches in dataloader
3. For each batch:
@@ -801,6 +825,12 @@ class Trainer:
- Track metrics
4. Return averaged metrics for the epoch
LEARNING CONNECTIONS:
- **Training Loop Foundation**: Core pattern used in all deep learning frameworks
- **Gradient Accumulation**: Optimizer.zero_grad() prevents gradient accumulation bugs
- **Backpropagation**: loss.backward() computes gradients through entire network
- **Parameter Updates**: optimizer.step() applies computed gradients to model weights
HINTS:
- Use optimizer.zero_grad() before each batch
- Call loss.backward() for gradient computation
@@ -863,7 +893,7 @@ class Trainer:
TODO: Implement single epoch validation logic.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Initialize epoch metrics tracking
2. Iterate through batches in dataloader
3. For each batch:
@@ -872,6 +902,12 @@ class Trainer:
- Track metrics
4. Return averaged metrics for the epoch
LEARNING CONNECTIONS:
- **Model Evaluation**: Validation measures generalization to unseen data
- **Overfitting Detection**: Comparing train vs validation metrics reveals overfitting
- **Model Selection**: Validation metrics guide hyperparameter tuning and architecture choices
- **Early Stopping**: Validation loss plateaus indicate optimal training duration
HINTS:
- No gradient computation needed for validation
- No parameter updates during validation
@@ -926,7 +962,7 @@ class Trainer:
TODO: Implement complete training loop.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Loop through epochs
2. For each epoch:
- Train on training data
@@ -935,6 +971,12 @@ class Trainer:
- Print progress (if verbose)
3. Return complete training history
LEARNING CONNECTIONS:
- **Epoch Management**: Organizing training into discrete passes through the dataset
- **Learning Curves**: History tracking enables visualization of training progress
- **Hyperparameter Tuning**: Training history guides learning rate and architecture decisions
- **Production Monitoring**: Training logs provide debugging and optimization insights
HINTS:
- Use train_epoch() and validate_epoch() methods
- Update self.history with results
@@ -1170,7 +1212,7 @@ class TrainingPipelineProfiler:
TODO: Implement comprehensive training step profiling.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Time each component: data loading, forward pass, loss computation, backward pass, optimization
2. Monitor memory usage throughout the pipeline
3. Calculate throughput metrics (samples/second, batches/second)
@@ -1180,6 +1222,12 @@ class TrainingPipelineProfiler:
EXAMPLE:
profiler = TrainingPipelineProfiler()
step_metrics = profiler.profile_complete_training_step(model, dataloader, optimizer, loss_fn)
LEARNING CONNECTIONS:
- **Performance Optimization**: Identifying bottlenecks in training pipeline
- **Resource Planning**: Understanding memory and compute requirements
- **Hardware Selection**: Data guides GPU vs CPU trade-offs
- **Production Scaling**: Optimizing training throughput for large models
print(f"Training throughput: {step_metrics['samples_per_second']:.1f} samples/sec")
HINTS:
@@ -1407,7 +1455,7 @@ class ProductionTrainingOptimizer:
TODO: Implement batch size optimization for production throughput.
APPROACH:
STEP-BY-STEP IMPLEMENTATION:
1. Test range of batch sizes from initial to maximum
2. For each batch size, measure:
- Training throughput (samples/second)
@@ -1421,6 +1469,12 @@ class ProductionTrainingOptimizer:
optimizer = ProductionTrainingOptimizer()
optimal_config = optimizer.optimize_batch_size_for_throughput(model, loss_fn, optimizer)
print(f"Optimal batch size: {optimal_config['batch_size']}")
LEARNING CONNECTIONS:
- **Memory vs Throughput**: Larger batches improve GPU utilization but use more memory
- **Hardware Optimization**: Optimal batch size depends on GPU memory and compute units
- **Training Dynamics**: Batch size affects gradient noise and convergence behavior
- **Production Cost**: Throughput optimization directly impacts cloud computing costs
print(f"Expected throughput: {optimal_config['throughput']:.1f} samples/sec")
HINTS:

View File

@@ -249,7 +249,7 @@ class BenchmarkScenarios:
TODO: Implement the three benchmark scenarios following MLPerf patterns.
UNDERSTANDING THE SCENARIOS:
STEP-BY-STEP IMPLEMENTATION:
1. Single-Stream: Send queries one at a time, measure latency
2. Server: Send queries following Poisson distribution, measure QPS
3. Offline: Send all queries at once, measure total throughput
@@ -260,6 +260,12 @@ class BenchmarkScenarios:
3. Calculate appropriate metrics for each scenario
4. Return BenchmarkResult with all measurements
LEARNING CONNECTIONS:
- **MLPerf Standards**: Industry-standard benchmarking methodology used by Google, NVIDIA, etc.
- **Performance Scenarios**: Different deployment patterns require different measurement approaches
- **Production Validation**: Benchmarking validates model performance before deployment
- **Resource Planning**: Results guide infrastructure scaling and capacity planning
EXAMPLE USAGE:
scenarios = BenchmarkScenarios()
result = scenarios.single_stream(model, dataset, num_queries=1000)
@@ -275,7 +281,7 @@ class BenchmarkScenarios:
TODO: Implement single-stream benchmarking.
STEP-BY-STEP:
STEP-BY-STEP IMPLEMENTATION:
1. Initialize empty list for latencies
2. For each query (up to num_queries):
a. Get next sample from dataset (cycle if needed)
@@ -288,6 +294,12 @@ class BenchmarkScenarios:
4. Calculate accuracy if possible
5. Return BenchmarkResult with SINGLE_STREAM scenario
LEARNING CONNECTIONS:
- **Mobile/Edge Deployment**: Single-stream simulates user-facing applications
- **Tail Latency**: 90th/95th percentiles matter more than averages for user experience
- **Interactive Systems**: Chatbots, recommendation engines use single-stream patterns
- **SLA Validation**: Ensures models meet response time requirements
HINTS:
- Use time.perf_counter() for precise timing
- Use dataset[i % len(dataset)] to cycle through samples
@@ -337,7 +349,7 @@ class BenchmarkScenarios:
TODO: Implement server benchmarking.
STEP-BY-STEP:
STEP-BY-STEP IMPLEMENTATION:
1. Calculate inter-arrival time = 1.0 / target_qps
2. Run for specified duration:
a. Wait for next query arrival (Poisson distribution)
@@ -348,6 +360,12 @@ class BenchmarkScenarios:
3. Calculate actual QPS = total_queries / duration
4. Return results
LEARNING CONNECTIONS:
- **Web Services**: Server scenario simulates API endpoints handling concurrent requests
- **Load Testing**: Validates system behavior under realistic traffic patterns
- **Scalability Analysis**: Tests how well models handle increasing load
- **Production Deployment**: Critical for microservices and web-scale applications
HINTS:
- Use np.random.exponential(inter_arrival_time) for Poisson
- Track both query arrival times and completion times
@@ -400,7 +418,7 @@ class BenchmarkScenarios:
TODO: Implement offline benchmarking.
STEP-BY-STEP:
STEP-BY-STEP IMPLEMENTATION:
1. Group dataset into batches of batch_size
2. For each batch:
a. Record start time
@@ -410,6 +428,12 @@ class BenchmarkScenarios:
3. Calculate total throughput = total_samples / total_time
4. Return results
LEARNING CONNECTIONS:
- **Batch Processing**: Offline scenario simulates data pipeline and ETL workloads
- **Throughput Optimization**: Maximizes processing efficiency for large datasets
- **Data Center Workloads**: Common in recommendation systems and analytics pipelines
- **Cost Optimization**: High throughput reduces compute costs per sample
HINTS:
- Process data in batches for efficiency
- Measure total time for all batches
@@ -521,7 +545,7 @@ class StatisticalValidator:
TODO: Implement statistical validation for benchmark results.
UNDERSTANDING STATISTICAL TESTING:
STEP-BY-STEP IMPLEMENTATION:
1. Null hypothesis: No difference between models
2. T-test: Compare means of two groups
3. P-value: Probability of seeing this difference by chance
@@ -534,6 +558,12 @@ class StatisticalValidator:
3. Calculate effect size (Cohen's d)
4. Calculate confidence interval
5. Provide clear recommendation
LEARNING CONNECTIONS:
- **Scientific Rigor**: Ensures performance claims are statistically valid
- **A/B Testing**: Foundation for production model comparison and rollout decisions
- **Research Validation**: Required for academic papers and technical reports
- **Business Decisions**: Statistical significance guides investment in new models
"""
def __init__(self, confidence_level: float = 0.95):
@@ -733,7 +763,7 @@ class TinyTorchPerf:
TODO: Implement the complete benchmarking framework.
UNDERSTANDING THE FRAMEWORK:
STEP-BY-STEP IMPLEMENTATION:
1. Combines all benchmark scenarios
2. Integrates statistical validation
3. Provides easy-to-use API
@@ -744,6 +774,12 @@ class TinyTorchPerf:
2. Provide methods for each scenario
3. Include statistical validation
4. Generate comprehensive reports
LEARNING CONNECTIONS:
- **MLPerf Integration**: Follows industry-standard benchmarking patterns
- **Production Deployment**: Validates models before production rollout
- **Performance Engineering**: Identifies bottlenecks and optimization opportunities
- **Framework Design**: Demonstrates how to build reusable ML tools
"""
def __init__(self):
@@ -1376,13 +1412,19 @@ class ProductionBenchmarkingProfiler:
TODO: Implement production-grade profiling capabilities.
UNDERSTANDING PRODUCTION PROFILING:
STEP-BY-STEP IMPLEMENTATION:
1. End-to-end pipeline analysis (not just model inference)
2. Resource utilization monitoring (CPU, memory, bandwidth)
3. Statistical A/B testing frameworks
4. Production monitoring and alerting integration
5. Performance regression detection
6. Load testing and capacity planning
LEARNING CONNECTIONS:
- **Production ML Systems**: Real-world profiling for deployment optimization
- **Performance Engineering**: Systematic approach to identifying and fixing bottlenecks
- **A/B Testing**: Statistical frameworks for safe model rollouts
- **Cost Optimization**: Understanding resource usage for efficient cloud deployment
"""
def __init__(self, enable_monitoring: bool = True):