mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-28 15:22:39 -05:00
Standardize all modules to follow NBGrader style guide
- Updated 7 non-compliant modules for consistency - Module 01_setup: Added EXAMPLE USAGE sections with code examples - Module 02_tensor: Added STEP-BY-STEP IMPLEMENTATION and LEARNING CONNECTIONS - Module 05_dense: Added LEARNING CONNECTIONS to all functions - Module 06_spatial: Added STEP-BY-STEP and LEARNING CONNECTIONS - Module 08_dataloader: Added LEARNING CONNECTIONS sections - Module 11_training: Added STEP-BY-STEP and LEARNING CONNECTIONS - Module 14_benchmarking: Added STEP-BY-STEP and LEARNING CONNECTIONS - All modules now follow consistent format per NBGRADER_STYLE_GUIDE.md - Preserved all existing solution blocks and functionality
This commit is contained in:
@@ -267,14 +267,22 @@ def personal_info() -> Dict[str, str]:
|
||||
4. Make system_name unique and descriptive
|
||||
5. Keep version as '1.0.0' for now
|
||||
|
||||
EXAMPLE OUTPUT:
|
||||
{
|
||||
'developer': 'Student Name',
|
||||
'email': 'student@university.edu',
|
||||
'institution': 'University Name',
|
||||
'system_name': 'StudentName-TinyTorch-Dev',
|
||||
'version': '1.0.0'
|
||||
}
|
||||
EXAMPLE USAGE:
|
||||
```python
|
||||
# Get your personal configuration
|
||||
info = personal_info()
|
||||
print(info['developer']) # Expected: "Your Name" (not placeholder)
|
||||
print(info['email']) # Expected: "you@domain.com" (valid email)
|
||||
print(info['system_name']) # Expected: "YourName-Dev" (unique identifier)
|
||||
print(info) # Expected: Complete dict with 5 fields
|
||||
# Output: {
|
||||
# 'developer': 'Your Name',
|
||||
# 'email': 'you@domain.com',
|
||||
# 'institution': 'Your Institution',
|
||||
# 'system_name': 'YourName-TinyTorch-Dev',
|
||||
# 'version': '1.0.0'
|
||||
# }
|
||||
```
|
||||
|
||||
IMPLEMENTATION HINTS:
|
||||
- Replace the example with your real information
|
||||
@@ -486,14 +494,25 @@ def system_info() -> Dict[str, Any]:
|
||||
6. Convert memory from bytes to GB (divide by 1024^3)
|
||||
7. Return all information in a dictionary
|
||||
|
||||
EXAMPLE OUTPUT:
|
||||
{
|
||||
'python_version': '3.9.7',
|
||||
'platform': 'Darwin',
|
||||
'architecture': 'arm64',
|
||||
'cpu_count': 8,
|
||||
'memory_gb': 16.0
|
||||
}
|
||||
EXAMPLE USAGE:
|
||||
```python
|
||||
# Query system information
|
||||
sys_info = system_info()
|
||||
print(f"Python: {sys_info['python_version']}") # Expected: "3.x.x"
|
||||
print(f"Platform: {sys_info['platform']}") # Expected: "Darwin"/"Linux"/"Windows"
|
||||
print(f"CPUs: {sys_info['cpu_count']}") # Expected: 4, 8, 16, etc.
|
||||
print(f"Memory: {sys_info['memory_gb']} GB") # Expected: 8.0, 16.0, 32.0, etc.
|
||||
|
||||
# Full output example:
|
||||
print(sys_info)
|
||||
# Expected: {
|
||||
# 'python_version': '3.9.7',
|
||||
# 'platform': 'Darwin',
|
||||
# 'architecture': 'arm64',
|
||||
# 'cpu_count': 8,
|
||||
# 'memory_gb': 16.0
|
||||
# }
|
||||
```
|
||||
|
||||
IMPLEMENTATION HINTS:
|
||||
- Use f-string formatting for Python version: f"{major}.{minor}.{micro}"
|
||||
|
||||
@@ -341,6 +341,18 @@ class Tensor:
|
||||
|
||||
TODO: Return the stored numpy array.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Access the internal _data attribute
|
||||
2. Return the numpy array directly
|
||||
3. This provides access to underlying data for NumPy operations
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- PyTorch: tensor.numpy() converts to NumPy for visualization/analysis
|
||||
- TensorFlow: tensor.numpy() enables integration with scientific Python
|
||||
- Production: Data scientists need to access raw arrays for debugging
|
||||
- Performance: Direct access avoids copying for read-only operations
|
||||
|
||||
HINT: Return self._data (the array you stored in __init__)
|
||||
"""
|
||||
### BEGIN SOLUTION
|
||||
@@ -354,6 +366,18 @@ class Tensor:
|
||||
|
||||
TODO: Return the shape of the stored numpy array.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Access the _data attribute (the NumPy array)
|
||||
2. Get the shape property from the NumPy array
|
||||
3. Return the shape tuple directly
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Neural networks: Layer compatibility requires matching shapes
|
||||
- Computer vision: Image shape (height, width, channels) determines architecture
|
||||
- NLP: Sequence length and vocabulary size affect model design
|
||||
- Debugging: Shape mismatches are the #1 cause of ML errors
|
||||
|
||||
HINT: Use .shape attribute of the numpy array
|
||||
EXAMPLE: Tensor([1, 2, 3]).shape should return (3,)
|
||||
"""
|
||||
@@ -368,6 +392,18 @@ class Tensor:
|
||||
|
||||
TODO: Return the total number of elements in the tensor.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Access the _data attribute (the NumPy array)
|
||||
2. Get the size property from the NumPy array
|
||||
3. Return the total element count as an integer
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Memory planning: Calculate RAM requirements for large tensors
|
||||
- Model architecture: Determine parameter counts for layers
|
||||
- Performance optimization: Size affects computation time
|
||||
- Batch processing: Total elements determines vectorization efficiency
|
||||
|
||||
HINT: Use .size attribute of the numpy array
|
||||
EXAMPLE: Tensor([1, 2, 3]).size should return 3
|
||||
"""
|
||||
@@ -382,6 +418,18 @@ class Tensor:
|
||||
|
||||
TODO: Return the data type of the stored numpy array.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Access the _data attribute (the NumPy array)
|
||||
2. Get the dtype property from the NumPy array
|
||||
3. Return the NumPy dtype object directly
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Precision vs speed: float32 is faster, float64 more accurate
|
||||
- Memory optimization: int8 uses 1/4 memory of int32
|
||||
- GPU compatibility: Some operations only work with specific types
|
||||
- Model deployment: Mobile/edge devices prefer smaller data types
|
||||
|
||||
HINT: Use .dtype attribute of the numpy array
|
||||
EXAMPLE: Tensor([1, 2, 3]).dtype should return dtype('int32')
|
||||
"""
|
||||
@@ -395,6 +443,19 @@ class Tensor:
|
||||
|
||||
TODO: Create a clear string representation of the tensor.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Convert the numpy array to a list using .tolist()
|
||||
2. Get shape and dtype information from properties
|
||||
3. Format as "Tensor([data], shape=shape, dtype=dtype)"
|
||||
4. Return the formatted string
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Debugging: Clear tensor representation speeds debugging
|
||||
- Jupyter notebooks: Good __repr__ improves data exploration
|
||||
- Logging: Production systems log tensor info for monitoring
|
||||
- Education: Students understand tensors better with clear output
|
||||
|
||||
APPROACH:
|
||||
1. Convert the numpy array to a list for readable output
|
||||
2. Include the shape and dtype information
|
||||
@@ -418,6 +479,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement tensor addition.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Extract numpy arrays from both tensors
|
||||
2. Use NumPy's + operator for element-wise addition
|
||||
3. Create a new Tensor object with the result
|
||||
4. Return the new tensor
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Neural networks: Adding bias terms to linear layer outputs
|
||||
- Residual connections: skip connections in ResNet architectures
|
||||
- Gradient updates: Adding computed gradients to parameters
|
||||
- Ensemble methods: Combining predictions from multiple models
|
||||
|
||||
APPROACH:
|
||||
1. Add the numpy arrays using +
|
||||
2. Return a new Tensor with the result
|
||||
@@ -442,6 +516,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement tensor multiplication.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Extract numpy arrays from both tensors
|
||||
2. Use NumPy's * operator for element-wise multiplication
|
||||
3. Create a new Tensor object with the result
|
||||
4. Return the new tensor
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Activation functions: Element-wise operations like ReLU masking
|
||||
- Attention mechanisms: Element-wise scaling in transformer models
|
||||
- Feature scaling: Multiplying features by learned scaling factors
|
||||
- Gating: Element-wise gating in LSTM and GRU cells
|
||||
|
||||
APPROACH:
|
||||
1. Multiply the numpy arrays using *
|
||||
2. Return a new Tensor with the result
|
||||
@@ -466,6 +553,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement + operator for tensors.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Check if other is a Tensor object
|
||||
2. If Tensor, call the add() method directly
|
||||
3. If scalar, convert to Tensor then call add()
|
||||
4. Return the result from add() method
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Natural syntax: tensor + scalar enables intuitive code
|
||||
- Broadcasting: Adding scalars to tensors is common in ML
|
||||
- Operator overloading: Python's magic methods enable math-like syntax
|
||||
- API design: Clean interfaces reduce cognitive load for researchers
|
||||
|
||||
APPROACH:
|
||||
1. If other is a Tensor, use tensor addition
|
||||
2. If other is a scalar, convert to Tensor first
|
||||
@@ -488,6 +588,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement * operator for tensors.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Check if other is a Tensor object
|
||||
2. If Tensor, call the multiply() method directly
|
||||
3. If scalar, convert to Tensor then call multiply()
|
||||
4. Return the result from multiply() method
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Scaling features: tensor * learning_rate for gradient updates
|
||||
- Masking: tensor * mask for attention mechanisms
|
||||
- Regularization: tensor * dropout_mask during training
|
||||
- Normalization: tensor * scale_factor in batch normalization
|
||||
|
||||
APPROACH:
|
||||
1. If other is a Tensor, use tensor multiplication
|
||||
2. If other is a scalar, convert to Tensor first
|
||||
@@ -510,6 +623,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement - operator for tensors.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Check if other is a Tensor object
|
||||
2. If Tensor, subtract other._data from self._data
|
||||
3. If scalar, subtract scalar directly from self._data
|
||||
4. Create new Tensor with result and return
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Gradient computation: parameter - learning_rate * gradient
|
||||
- Residual connections: output - skip_connection in some architectures
|
||||
- Error calculation: predicted - actual for loss computation
|
||||
- Centering data: tensor - mean for zero-centered inputs
|
||||
|
||||
APPROACH:
|
||||
1. Convert other to Tensor if needed
|
||||
2. Subtract using numpy arrays
|
||||
@@ -533,6 +659,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement / operator for tensors.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Check if other is a Tensor object
|
||||
2. If Tensor, divide self._data by other._data
|
||||
3. If scalar, divide self._data by scalar directly
|
||||
4. Create new Tensor with result and return
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Normalization: tensor / std_deviation for standard scaling
|
||||
- Learning rate decay: parameter / decay_factor over time
|
||||
- Probability computation: counts / total_counts for frequencies
|
||||
- Temperature scaling: logits / temperature in softmax functions
|
||||
|
||||
APPROACH:
|
||||
1. Convert other to Tensor if needed
|
||||
2. Divide using numpy arrays
|
||||
@@ -560,6 +699,19 @@ class Tensor:
|
||||
|
||||
TODO: Implement matrix multiplication.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Extract numpy arrays from both tensors
|
||||
2. Use np.matmul() for proper matrix multiplication
|
||||
3. Create new Tensor object with the result
|
||||
4. Return the new tensor
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
Real-world relevance:
|
||||
- Linear layers: input @ weight matrices in neural networks
|
||||
- Transformer attention: Q @ K^T for attention scores
|
||||
- CNN convolutions: Implemented as matrix multiplications
|
||||
- Batch processing: Matrix ops enable parallel computation
|
||||
|
||||
APPROACH:
|
||||
1. Use np.matmul() to perform matrix multiplication
|
||||
2. Return a new Tensor with the result
|
||||
|
||||
@@ -206,6 +206,12 @@ class Sequential:
|
||||
HINTS:
|
||||
- Use self.layers to store the layers
|
||||
- Handle empty initialization case
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- This is equivalent to torch.nn.Sequential in PyTorch
|
||||
- Used in every neural network to chain layers together
|
||||
- Foundation for models like VGG, ResNet, and transformers
|
||||
- Enables modular network design and experimentation
|
||||
"""
|
||||
### BEGIN SOLUTION
|
||||
self.layers = layers if layers is not None else []
|
||||
@@ -241,6 +247,12 @@ class Sequential:
|
||||
- Apply each layer: x = layer(x)
|
||||
- The output of one layer becomes input to the next
|
||||
- Return the final result
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- This is the core of feedforward neural networks
|
||||
- Powers inference in every deployed model
|
||||
- Critical for real-time predictions in production
|
||||
- Foundation for gradient flow in backpropagation
|
||||
"""
|
||||
### BEGIN SOLUTION
|
||||
# Apply each layer in sequence
|
||||
@@ -394,6 +406,12 @@ def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int,
|
||||
- For each hidden_size: add Dense(current_size, hidden_size), then activation
|
||||
- Finally add Dense(last_hidden_size, output_size), then output_activation
|
||||
- Return Sequential(layers)
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- This pattern is used in every feedforward network implementation
|
||||
- Foundation for architectures like autoencoders and GANs
|
||||
- Enables rapid prototyping of neural architectures
|
||||
- Similar to tf.keras.Sequential with Dense layers
|
||||
"""
|
||||
layers = []
|
||||
current_size = input_size
|
||||
@@ -1031,6 +1049,12 @@ class NetworkStabilityMonitor:
|
||||
- Check: np.any(np.isinf(tensor.data))
|
||||
- Check: np.any(np.abs(tensor.data) > self.warning_threshold)
|
||||
- Return dict with analysis
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- Critical for debugging exploding/vanishing gradients
|
||||
- Used in production monitoring systems at scale
|
||||
- Foundation for automated model health checks
|
||||
- Similar to TensorBoard's histogram monitoring
|
||||
"""
|
||||
### BEGIN SOLUTION
|
||||
data = tensor.data
|
||||
@@ -1111,6 +1135,12 @@ class NetworkStabilityMonitor:
|
||||
- Simple loss: 0.5 * np.sum((output.data - target_output.data)**2)
|
||||
- Use small perturbations to estimate gradients
|
||||
- Vanishing: gradients < 1e-6, Exploding: gradients > 1e3
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- Essential for training deep networks successfully
|
||||
- Used in gradient clipping and batch normalization design
|
||||
- Foundation for understanding network initialization strategies
|
||||
- Similar to PyTorch's gradient debugging tools
|
||||
"""
|
||||
### BEGIN SOLUTION
|
||||
# Forward pass
|
||||
|
||||
@@ -145,7 +145,7 @@ def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
|
||||
|
||||
TODO: Implement the sliding window convolution using for-loops.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Get input dimensions: H, W = input.shape
|
||||
2. Get kernel dimensions: kH, kW = kernel.shape
|
||||
3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1
|
||||
@@ -157,6 +157,12 @@ def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
|
||||
- dj loop: kernel columns (0 to kW-1)
|
||||
6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Computer Vision Foundation**: Convolution is the core operation in CNNs and image processing
|
||||
- **Feature Detection**: Different kernels detect edges, textures, and patterns in images
|
||||
- **Spatial Hierarchies**: Convolution preserves spatial relationships while extracting features
|
||||
- **Production CNNs**: Understanding the basic operation helps optimize GPU implementations
|
||||
|
||||
EXAMPLE:
|
||||
Input: [[1, 2, 3], Kernel: [[1, 0],
|
||||
[4, 5, 6], [0, -1]]
|
||||
@@ -467,10 +473,16 @@ def flatten(x):
|
||||
|
||||
TODO: Implement flattening operation.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Get the numpy array from the tensor
|
||||
2. Use .flatten() to convert to 1D
|
||||
3. Add batch dimension with [None, :]
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **CNN to MLP Transition**: Flattening connects convolutional and dense layers
|
||||
- **Spatial to Vector**: Converts 2D feature maps to vectors for classification
|
||||
- **Memory Layout**: Understanding how tensors are stored and reshaped in memory
|
||||
- **Framework Design**: All major frameworks (PyTorch, TensorFlow) use similar patterns
|
||||
4. Return Tensor wrapped around the result
|
||||
|
||||
EXAMPLE:
|
||||
@@ -955,6 +967,18 @@ class ConvolutionProfiler:
|
||||
|
||||
TODO: Implement convolution operation profiling.
|
||||
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Profile different kernel sizes and their computational costs
|
||||
2. Measure memory usage patterns for spatial operations
|
||||
3. Analyze cache efficiency and memory access patterns
|
||||
4. Identify optimization opportunities for production systems
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Performance Optimization**: Understanding computational costs of different kernel sizes
|
||||
- **Memory Efficiency**: Cache-friendly access patterns improve performance significantly
|
||||
- **Production Scaling**: Profiling guides hardware selection and deployment strategies
|
||||
- **GPU Optimization**: Spatial operations are ideal for parallel processing
|
||||
|
||||
APPROACH:
|
||||
1. Time convolution operations with different kernel sizes
|
||||
2. Analyze memory usage patterns for spatial operations
|
||||
|
||||
@@ -184,7 +184,7 @@ class Dataset:
|
||||
|
||||
TODO: Implement abstract method for getting samples.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. This is an abstract method - subclasses will implement it
|
||||
2. Return a tuple of (data, label) tensors
|
||||
3. Data should be the input features, label should be the target
|
||||
@@ -192,6 +192,12 @@ class Dataset:
|
||||
EXAMPLE:
|
||||
dataset[0] should return (Tensor(image_data), Tensor(label))
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **PyTorch Integration**: This follows the exact same pattern as torch.utils.data.Dataset
|
||||
- **Production Data**: Real datasets like ImageNet, CIFAR-10 use this interface
|
||||
- **Memory Efficiency**: On-demand loading prevents loading entire dataset into memory
|
||||
- **Batching Foundation**: DataLoader uses __getitem__ to create batches efficiently
|
||||
|
||||
HINTS:
|
||||
- This is an abstract method that subclasses must override
|
||||
- Always return a tuple of (data, label) tensors
|
||||
@@ -208,13 +214,19 @@ class Dataset:
|
||||
|
||||
TODO: Implement abstract method for getting dataset size.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. This is an abstract method - subclasses will implement it
|
||||
2. Return the total number of samples in the dataset
|
||||
|
||||
EXAMPLE:
|
||||
len(dataset) should return 50000 for CIFAR-10 training set
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Memory Planning**: DataLoader uses len() to calculate number of batches
|
||||
- **Progress Tracking**: Training loops use len() for progress bars and epoch calculations
|
||||
- **Distributed Training**: Multi-GPU systems need dataset size for work distribution
|
||||
- **Statistical Sampling**: Some training strategies require knowing total dataset size
|
||||
|
||||
HINTS:
|
||||
- This is an abstract method that subclasses must override
|
||||
- Return an integer representing the total number of samples
|
||||
@@ -230,7 +242,7 @@ class Dataset:
|
||||
|
||||
TODO: Implement method to get sample shape.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Get the first sample using self[0]
|
||||
2. Extract the data part (first element of tuple)
|
||||
3. Return the shape of the data tensor
|
||||
@@ -238,6 +250,12 @@ class Dataset:
|
||||
EXAMPLE:
|
||||
For CIFAR-10: returns (3, 32, 32) for RGB images
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Model Architecture**: Neural networks need to know input shape for first layer
|
||||
- **Batch Planning**: Systems use sample shape to calculate memory requirements
|
||||
- **Preprocessing Validation**: Ensures all samples have consistent shape
|
||||
- **Framework Integration**: Similar to PyTorch's dataset shape inspection
|
||||
|
||||
HINTS:
|
||||
- Use self[0] to get the first sample
|
||||
- Extract data from the (data, label) tuple
|
||||
@@ -255,13 +273,19 @@ class Dataset:
|
||||
|
||||
TODO: Implement abstract method for getting number of classes.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. This is an abstract method - subclasses will implement it
|
||||
2. Return the number of unique classes in the dataset
|
||||
|
||||
EXAMPLE:
|
||||
For CIFAR-10: returns 10 (classes 0-9)
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Output Layer Design**: Neural networks need num_classes for final layer size
|
||||
- **Loss Function Setup**: CrossEntropyLoss uses num_classes for proper computation
|
||||
- **Evaluation Metrics**: Accuracy calculation depends on number of classes
|
||||
- **Model Validation**: Ensures model predictions match expected class range
|
||||
|
||||
HINTS:
|
||||
- This is an abstract method that subclasses must override
|
||||
- Return the number of unique classes/categories
|
||||
@@ -432,7 +456,7 @@ class DataLoader:
|
||||
|
||||
TODO: Implement batching and shuffling logic.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Create indices list: list(range(len(dataset)))
|
||||
2. Shuffle indices if self.shuffle is True
|
||||
3. Loop through indices in batch_size chunks
|
||||
@@ -443,6 +467,12 @@ class DataLoader:
|
||||
# batch_data.shape: (batch_size, ...)
|
||||
# batch_labels.shape: (batch_size,)
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **GPU Efficiency**: Batching maximizes GPU utilization by processing multiple samples together
|
||||
- **Training Stability**: Shuffling prevents overfitting to data order and improves generalization
|
||||
- **Memory Management**: Batches fit in GPU memory while full dataset may not
|
||||
- **Gradient Estimation**: Batch gradients provide better estimates than single-sample gradients
|
||||
|
||||
HINTS:
|
||||
- Use list(range(len(self.dataset))) for indices
|
||||
- Use np.random.shuffle() if self.shuffle is True
|
||||
@@ -1172,6 +1202,12 @@ class DataPipelineProfiler:
|
||||
print(f"Avg batch time: {timing['avg_batch_time']:.3f}s")
|
||||
print(f"Bottleneck: {timing['is_bottleneck']}")
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Production Optimization**: Fast GPUs often wait for slow data loading
|
||||
- **System Bottlenecks**: Data loading can limit training speed more than model complexity
|
||||
- **Resource Planning**: Understanding I/O vs compute trade-offs for hardware selection
|
||||
- **Pipeline Tuning**: Multi-worker data loading and prefetching strategies
|
||||
|
||||
HINTS:
|
||||
- Use enumerate(dataloader) to get batches
|
||||
- Time each batch: start = time.time(), batch = next(iter), end = time.time()
|
||||
@@ -1245,6 +1281,12 @@ class DataPipelineProfiler:
|
||||
analysis = profiler.analyze_batch_size_scaling(my_dataset, [16, 32, 64])
|
||||
print(f"Optimal batch size: {analysis['optimal_batch_size']}")
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Memory vs Throughput**: Larger batches improve throughput but consume more memory
|
||||
- **Hardware Optimization**: Optimal batch size depends on GPU memory and compute units
|
||||
- **Training Dynamics**: Batch size affects gradient noise and convergence behavior
|
||||
- **Production Scaling**: Understanding batch size impact on serving latency and cost
|
||||
|
||||
HINTS:
|
||||
- Create DataLoader: DataLoader(dataset, batch_size=bs, shuffle=False)
|
||||
- Time with self.time_dataloader_iteration()
|
||||
|
||||
@@ -155,7 +155,7 @@ class MeanSquaredError:
|
||||
|
||||
TODO: Implement Mean SquaredError loss computation.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Compute difference: diff = y_pred - y_true
|
||||
2. Square the differences: squared_diff = diff²
|
||||
3. Take mean over all elements: mean(squared_diff)
|
||||
@@ -168,6 +168,12 @@ class MeanSquaredError:
|
||||
# Should return: mean([(1.0-1.5)², (2.0-2.5)², (3.0-2.5)², (4.0-3.5)²])
|
||||
# = mean([0.25, 0.25, 0.25, 0.25]) = 0.25
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Regression Optimization**: MSE loss guides models toward accurate numerical predictions
|
||||
- **Gradient Properties**: MSE provides smooth gradients proportional to prediction error
|
||||
- **Outlier Sensitivity**: Squared errors heavily penalize large mistakes
|
||||
- **Production Usage**: Common in recommendation systems, time series, and financial modeling
|
||||
|
||||
HINTS:
|
||||
- Use tensor subtraction: y_pred - y_true
|
||||
- Use tensor power: diff ** 2
|
||||
@@ -261,7 +267,7 @@ class CrossEntropyLoss:
|
||||
|
||||
TODO: Implement Cross-Entropy loss computation.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Handle both class indices and one-hot encoded labels
|
||||
2. Apply softmax to predictions for probability distribution
|
||||
3. Compute log probabilities: log(softmax(y_pred))
|
||||
@@ -274,6 +280,12 @@ class CrossEntropyLoss:
|
||||
loss = crossentropy_loss(y_pred, y_true)
|
||||
# Should apply softmax then compute -log(prob_of_correct_class)
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Classification Foundation**: CrossEntropy is the standard loss for multi-class problems
|
||||
- **Probability Interpretation**: Measures difference between predicted and true distributions
|
||||
- **Information Theory**: Based on entropy and KL divergence concepts
|
||||
- **Production Systems**: Used in image classification, NLP, and recommendation systems
|
||||
|
||||
HINTS:
|
||||
- Use softmax: exp(x) / sum(exp(x)) for probability distribution
|
||||
- Add small epsilon (1e-15) to avoid log(0)
|
||||
@@ -392,7 +404,7 @@ class BinaryCrossEntropyLoss:
|
||||
|
||||
TODO: Implement Binary Cross-Entropy loss computation.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Apply sigmoid to predictions for probability values
|
||||
2. Clip probabilities to avoid log(0) and log(1)
|
||||
3. Compute: -y_true * log(y_pred) - (1-y_true) * log(1-y_pred)
|
||||
@@ -405,6 +417,12 @@ class BinaryCrossEntropyLoss:
|
||||
loss = bce_loss(y_pred, y_true)
|
||||
# Should apply sigmoid then compute binary cross-entropy
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Binary Classification**: Standard loss for yes/no, spam/ham, fraud detection
|
||||
- **Sigmoid Output**: Maps any real number to probability range [0,1]
|
||||
- **Medical Diagnosis**: Common in disease detection and medical screening
|
||||
- **A/B Testing**: Used for conversion prediction and user behavior modeling
|
||||
|
||||
HINTS:
|
||||
- Use sigmoid: 1 / (1 + exp(-x))
|
||||
- Clip probabilities: np.clip(probs, epsilon, 1-epsilon)
|
||||
@@ -577,7 +595,7 @@ class Accuracy:
|
||||
|
||||
TODO: Implement accuracy computation.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Convert predictions to class indices (argmax for multi-class)
|
||||
2. Convert true labels to class indices if needed
|
||||
3. Count correct predictions
|
||||
@@ -590,6 +608,12 @@ class Accuracy:
|
||||
accuracy = accuracy_metric(y_pred, y_true)
|
||||
# Should return: 2/3 = 0.667 (first and second predictions correct)
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Model Evaluation**: Primary metric for classification model performance
|
||||
- **Business KPIs**: Often directly tied to business objectives and success metrics
|
||||
- **Baseline Comparison**: Standard metric for comparing different models
|
||||
- **Production Monitoring**: Real-time accuracy monitoring for model health
|
||||
|
||||
HINTS:
|
||||
- Use np.argmax(axis=1) for multi-class predictions
|
||||
- Handle both probability and class index inputs
|
||||
@@ -789,7 +813,7 @@ class Trainer:
|
||||
|
||||
TODO: Implement single epoch training logic.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Initialize epoch metrics tracking
|
||||
2. Iterate through batches in dataloader
|
||||
3. For each batch:
|
||||
@@ -801,6 +825,12 @@ class Trainer:
|
||||
- Track metrics
|
||||
4. Return averaged metrics for the epoch
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Training Loop Foundation**: Core pattern used in all deep learning frameworks
|
||||
- **Gradient Accumulation**: Optimizer.zero_grad() prevents gradient accumulation bugs
|
||||
- **Backpropagation**: loss.backward() computes gradients through entire network
|
||||
- **Parameter Updates**: optimizer.step() applies computed gradients to model weights
|
||||
|
||||
HINTS:
|
||||
- Use optimizer.zero_grad() before each batch
|
||||
- Call loss.backward() for gradient computation
|
||||
@@ -863,7 +893,7 @@ class Trainer:
|
||||
|
||||
TODO: Implement single epoch validation logic.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Initialize epoch metrics tracking
|
||||
2. Iterate through batches in dataloader
|
||||
3. For each batch:
|
||||
@@ -872,6 +902,12 @@ class Trainer:
|
||||
- Track metrics
|
||||
4. Return averaged metrics for the epoch
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Model Evaluation**: Validation measures generalization to unseen data
|
||||
- **Overfitting Detection**: Comparing train vs validation metrics reveals overfitting
|
||||
- **Model Selection**: Validation metrics guide hyperparameter tuning and architecture choices
|
||||
- **Early Stopping**: Validation loss plateaus indicate optimal training duration
|
||||
|
||||
HINTS:
|
||||
- No gradient computation needed for validation
|
||||
- No parameter updates during validation
|
||||
@@ -926,7 +962,7 @@ class Trainer:
|
||||
|
||||
TODO: Implement complete training loop.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Loop through epochs
|
||||
2. For each epoch:
|
||||
- Train on training data
|
||||
@@ -935,6 +971,12 @@ class Trainer:
|
||||
- Print progress (if verbose)
|
||||
3. Return complete training history
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Epoch Management**: Organizing training into discrete passes through the dataset
|
||||
- **Learning Curves**: History tracking enables visualization of training progress
|
||||
- **Hyperparameter Tuning**: Training history guides learning rate and architecture decisions
|
||||
- **Production Monitoring**: Training logs provide debugging and optimization insights
|
||||
|
||||
HINTS:
|
||||
- Use train_epoch() and validate_epoch() methods
|
||||
- Update self.history with results
|
||||
@@ -1170,7 +1212,7 @@ class TrainingPipelineProfiler:
|
||||
|
||||
TODO: Implement comprehensive training step profiling.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Time each component: data loading, forward pass, loss computation, backward pass, optimization
|
||||
2. Monitor memory usage throughout the pipeline
|
||||
3. Calculate throughput metrics (samples/second, batches/second)
|
||||
@@ -1180,6 +1222,12 @@ class TrainingPipelineProfiler:
|
||||
EXAMPLE:
|
||||
profiler = TrainingPipelineProfiler()
|
||||
step_metrics = profiler.profile_complete_training_step(model, dataloader, optimizer, loss_fn)
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Performance Optimization**: Identifying bottlenecks in training pipeline
|
||||
- **Resource Planning**: Understanding memory and compute requirements
|
||||
- **Hardware Selection**: Data guides GPU vs CPU trade-offs
|
||||
- **Production Scaling**: Optimizing training throughput for large models
|
||||
print(f"Training throughput: {step_metrics['samples_per_second']:.1f} samples/sec")
|
||||
|
||||
HINTS:
|
||||
@@ -1407,7 +1455,7 @@ class ProductionTrainingOptimizer:
|
||||
|
||||
TODO: Implement batch size optimization for production throughput.
|
||||
|
||||
APPROACH:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Test range of batch sizes from initial to maximum
|
||||
2. For each batch size, measure:
|
||||
- Training throughput (samples/second)
|
||||
@@ -1421,6 +1469,12 @@ class ProductionTrainingOptimizer:
|
||||
optimizer = ProductionTrainingOptimizer()
|
||||
optimal_config = optimizer.optimize_batch_size_for_throughput(model, loss_fn, optimizer)
|
||||
print(f"Optimal batch size: {optimal_config['batch_size']}")
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Memory vs Throughput**: Larger batches improve GPU utilization but use more memory
|
||||
- **Hardware Optimization**: Optimal batch size depends on GPU memory and compute units
|
||||
- **Training Dynamics**: Batch size affects gradient noise and convergence behavior
|
||||
- **Production Cost**: Throughput optimization directly impacts cloud computing costs
|
||||
print(f"Expected throughput: {optimal_config['throughput']:.1f} samples/sec")
|
||||
|
||||
HINTS:
|
||||
|
||||
@@ -249,7 +249,7 @@ class BenchmarkScenarios:
|
||||
|
||||
TODO: Implement the three benchmark scenarios following MLPerf patterns.
|
||||
|
||||
UNDERSTANDING THE SCENARIOS:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Single-Stream: Send queries one at a time, measure latency
|
||||
2. Server: Send queries following Poisson distribution, measure QPS
|
||||
3. Offline: Send all queries at once, measure total throughput
|
||||
@@ -260,6 +260,12 @@ class BenchmarkScenarios:
|
||||
3. Calculate appropriate metrics for each scenario
|
||||
4. Return BenchmarkResult with all measurements
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **MLPerf Standards**: Industry-standard benchmarking methodology used by Google, NVIDIA, etc.
|
||||
- **Performance Scenarios**: Different deployment patterns require different measurement approaches
|
||||
- **Production Validation**: Benchmarking validates model performance before deployment
|
||||
- **Resource Planning**: Results guide infrastructure scaling and capacity planning
|
||||
|
||||
EXAMPLE USAGE:
|
||||
scenarios = BenchmarkScenarios()
|
||||
result = scenarios.single_stream(model, dataset, num_queries=1000)
|
||||
@@ -275,7 +281,7 @@ class BenchmarkScenarios:
|
||||
|
||||
TODO: Implement single-stream benchmarking.
|
||||
|
||||
STEP-BY-STEP:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Initialize empty list for latencies
|
||||
2. For each query (up to num_queries):
|
||||
a. Get next sample from dataset (cycle if needed)
|
||||
@@ -288,6 +294,12 @@ class BenchmarkScenarios:
|
||||
4. Calculate accuracy if possible
|
||||
5. Return BenchmarkResult with SINGLE_STREAM scenario
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Mobile/Edge Deployment**: Single-stream simulates user-facing applications
|
||||
- **Tail Latency**: 90th/95th percentiles matter more than averages for user experience
|
||||
- **Interactive Systems**: Chatbots, recommendation engines use single-stream patterns
|
||||
- **SLA Validation**: Ensures models meet response time requirements
|
||||
|
||||
HINTS:
|
||||
- Use time.perf_counter() for precise timing
|
||||
- Use dataset[i % len(dataset)] to cycle through samples
|
||||
@@ -337,7 +349,7 @@ class BenchmarkScenarios:
|
||||
|
||||
TODO: Implement server benchmarking.
|
||||
|
||||
STEP-BY-STEP:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Calculate inter-arrival time = 1.0 / target_qps
|
||||
2. Run for specified duration:
|
||||
a. Wait for next query arrival (Poisson distribution)
|
||||
@@ -348,6 +360,12 @@ class BenchmarkScenarios:
|
||||
3. Calculate actual QPS = total_queries / duration
|
||||
4. Return results
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Web Services**: Server scenario simulates API endpoints handling concurrent requests
|
||||
- **Load Testing**: Validates system behavior under realistic traffic patterns
|
||||
- **Scalability Analysis**: Tests how well models handle increasing load
|
||||
- **Production Deployment**: Critical for microservices and web-scale applications
|
||||
|
||||
HINTS:
|
||||
- Use np.random.exponential(inter_arrival_time) for Poisson
|
||||
- Track both query arrival times and completion times
|
||||
@@ -400,7 +418,7 @@ class BenchmarkScenarios:
|
||||
|
||||
TODO: Implement offline benchmarking.
|
||||
|
||||
STEP-BY-STEP:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Group dataset into batches of batch_size
|
||||
2. For each batch:
|
||||
a. Record start time
|
||||
@@ -410,6 +428,12 @@ class BenchmarkScenarios:
|
||||
3. Calculate total throughput = total_samples / total_time
|
||||
4. Return results
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Batch Processing**: Offline scenario simulates data pipeline and ETL workloads
|
||||
- **Throughput Optimization**: Maximizes processing efficiency for large datasets
|
||||
- **Data Center Workloads**: Common in recommendation systems and analytics pipelines
|
||||
- **Cost Optimization**: High throughput reduces compute costs per sample
|
||||
|
||||
HINTS:
|
||||
- Process data in batches for efficiency
|
||||
- Measure total time for all batches
|
||||
@@ -521,7 +545,7 @@ class StatisticalValidator:
|
||||
|
||||
TODO: Implement statistical validation for benchmark results.
|
||||
|
||||
UNDERSTANDING STATISTICAL TESTING:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Null hypothesis: No difference between models
|
||||
2. T-test: Compare means of two groups
|
||||
3. P-value: Probability of seeing this difference by chance
|
||||
@@ -534,6 +558,12 @@ class StatisticalValidator:
|
||||
3. Calculate effect size (Cohen's d)
|
||||
4. Calculate confidence interval
|
||||
5. Provide clear recommendation
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Scientific Rigor**: Ensures performance claims are statistically valid
|
||||
- **A/B Testing**: Foundation for production model comparison and rollout decisions
|
||||
- **Research Validation**: Required for academic papers and technical reports
|
||||
- **Business Decisions**: Statistical significance guides investment in new models
|
||||
"""
|
||||
|
||||
def __init__(self, confidence_level: float = 0.95):
|
||||
@@ -733,7 +763,7 @@ class TinyTorchPerf:
|
||||
|
||||
TODO: Implement the complete benchmarking framework.
|
||||
|
||||
UNDERSTANDING THE FRAMEWORK:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. Combines all benchmark scenarios
|
||||
2. Integrates statistical validation
|
||||
3. Provides easy-to-use API
|
||||
@@ -744,6 +774,12 @@ class TinyTorchPerf:
|
||||
2. Provide methods for each scenario
|
||||
3. Include statistical validation
|
||||
4. Generate comprehensive reports
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **MLPerf Integration**: Follows industry-standard benchmarking patterns
|
||||
- **Production Deployment**: Validates models before production rollout
|
||||
- **Performance Engineering**: Identifies bottlenecks and optimization opportunities
|
||||
- **Framework Design**: Demonstrates how to build reusable ML tools
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
@@ -1376,13 +1412,19 @@ class ProductionBenchmarkingProfiler:
|
||||
|
||||
TODO: Implement production-grade profiling capabilities.
|
||||
|
||||
UNDERSTANDING PRODUCTION PROFILING:
|
||||
STEP-BY-STEP IMPLEMENTATION:
|
||||
1. End-to-end pipeline analysis (not just model inference)
|
||||
2. Resource utilization monitoring (CPU, memory, bandwidth)
|
||||
3. Statistical A/B testing frameworks
|
||||
4. Production monitoring and alerting integration
|
||||
5. Performance regression detection
|
||||
6. Load testing and capacity planning
|
||||
|
||||
LEARNING CONNECTIONS:
|
||||
- **Production ML Systems**: Real-world profiling for deployment optimization
|
||||
- **Performance Engineering**: Systematic approach to identifying and fixing bottlenecks
|
||||
- **A/B Testing**: Statistical frameworks for safe model rollouts
|
||||
- **Cost Optimization**: Understanding resource usage for efficient cloud deployment
|
||||
"""
|
||||
|
||||
def __init__(self, enable_monitoring: bool = True):
|
||||
|
||||
Reference in New Issue
Block a user