# --- # jupyter: # jupytext: # text_representation: # extension: .py # format_name: percent # format_version: '1.3' # jupytext_version: 1.17.1 # --- # %% [markdown] """ # Layers - Neural Network Building Blocks and Composition Patterns Welcome to the unified Layers module! You'll build all the fundamental components for neural networks: base classes, linear transformations, network composition, and tensor reshaping operations. ## Learning Goals - Systems understanding: How layer composition creates complex function approximators from simple building blocks - Core implementation skill: Build Module base class, Linear layers, Sequential networks, and Flatten operations - Pattern recognition: Understand how different layer types solve different computational problems and compose together - Framework connection: See how your implementations mirror PyTorch's nn.Module, nn.Linear, nn.Sequential, and nn.Flatten patterns - Performance insight: Learn why layer composition, memory layout, and tensor operations determine training speed ## Build โ†’ Use โ†’ Reflect 1. **Build**: Module system, matrix operations, Dense layers, Sequential networks, and tensor flattening 2. **Use**: Compose all components into complete neural networks and observe data flow patterns 3. **Reflect**: Why does proper abstraction enable complex architectures while maintaining clean interfaces? ## What You'll Achieve By the end of this module, you'll understand: - Deep technical understanding of neural network component architecture and composition patterns - Practical capability to build complete neural network systems from fundamental building blocks - Systems insight into why modular design is essential for scalable ML systems - Performance consideration of how tensor operations and memory layout affect computational efficiency - Connection to production ML systems and how major frameworks organize neural network components ## Systems Reality Check ๐Ÿ’ก **Production Context**: PyTorch's nn.Module system enables all modern neural networks through clean composition patterns โšก **Performance Note**: Tensor reshape operations and layer composition can create memory bottlenecks - understanding this is key to efficient neural network design """ # %% nbgrader={"grade": false, "grade_id": "layers-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} #| default_exp core.layers #| export import numpy as np import sys import os # Import our building blocks - try package first, then local modules try: from tinytorch.core.tensor import Tensor, Parameter except ImportError: # For development, import from local modules sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_tensor')) from tensor_dev import Tensor, Parameter # %% nbgrader={"grade": false, "grade_id": "layers-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} print("๐Ÿ”ฅ TinyTorch Layers Module") print(f"NumPy version: {np.__version__}") print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") print("Ready to build neural network layers!") # %% [markdown] """ ## Module Base Class - Neural Network Foundation Before building specific layers like Dense and Conv2d, we need a base class that handles parameter management and provides a clean interface. This is the foundation that makes neural networks composable and easy to use. ### Why We Need a Module Base Class ๐Ÿ—๏ธ **Organization**: Automatic parameter collection across all layers ๐Ÿ”„ **Composition**: Modules can contain other modules (networks of networks) ๐ŸŽฏ **Clean API**: Enable `model(input)` instead of `model.forward(input)` ๐Ÿ“ฆ **PyTorch Compatibility**: Same patterns as `torch.nn.Module` Let's build the foundation that will make all our neural network code clean and powerful: """ # %% nbgrader={"grade": false, "grade_id": "module-base-class", "locked": false, "schema_version": 3, "solution": false, "task": false} #| export class Module: """ Base class for all neural network modules. Provides automatic parameter collection, forward pass management, and clean composition patterns. All layers (Dense, Conv2d, etc.) inherit from this class. Key Features: - Automatic parameter registration when you assign parameter Tensors (weights, bias) - Recursive parameter collection from sub-modules - Clean __call__ interface: model(x) instead of model.forward(x) - Extensible for custom layers Example Usage: class MLP(Module): def __init__(self): super().__init__() self.layer1 = Dense(784, 128) # Auto-registered! self.layer2 = Dense(128, 10) # Auto-registered! def forward(self, x): x = self.layer1(x) return self.layer2(x) model = MLP() params = model.parameters() # Gets all parameters automatically! output = model(input) # Clean interface! """ def __init__(self): """Initialize module with empty parameter and sub-module storage.""" self._parameters = [] self._modules = [] def __setattr__(self, name, value): """ Intercept attribute assignment to auto-register parameters and modules. When you do self.weight = Parameter(...), this automatically adds the parameter to our collection for easy optimization. """ # Check if it's a Tensor that looks like a parameter (has .data attribute) # Parameters are typically named 'weights', 'bias', 'weight', etc. if (hasattr(value, 'data') and hasattr(value, 'shape') and isinstance(value, Tensor) and name in ['weights', 'weight', 'bias']): self._parameters.append(value) # Check if it's another Module (sub-module) elif isinstance(value, Module): self._modules.append(value) # Always call parent to actually set the attribute super().__setattr__(name, value) def parameters(self): """ Recursively collect all parameters from this module and sub-modules. Returns: List of all parameters (Tensors containing weights and biases) This enables: optimizer = Adam(model.parameters()) (when optimizers are available) """ # Start with our own parameters params = list(self._parameters) # Add parameters from sub-modules recursively for module in self._modules: params.extend(module.parameters()) return params def __call__(self, *args, **kwargs): """ Makes modules callable: model(x) instead of model.forward(x). This is the magic that enables clean syntax like: output = model(input) instead of: output = model.forward(input) """ return self.forward(*args, **kwargs) def forward(self, *args, **kwargs): """ Forward pass - must be implemented by subclasses. This is where the actual computation happens. Every layer defines its own forward() method. """ raise NotImplementedError("Subclasses must implement forward()") # %% [markdown] """ ## Where This Code Lives in the Final Package **Learning Side:** You work in modules/04_layers/layers_dev.py **Building Side:** Code exports to tinytorch.core.layers ```python # Final package structure: from tinytorch.core.layers import Module, Linear, Dense, Sequential, Flatten, matmul # Complete layer system! from tinytorch.core.tensor import Tensor, Parameter # The foundation from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity ``` **Why this matters:** - **Learning:** Complete layer system in one focused module for deep understanding - **Production:** Proper organization like PyTorch's torch.nn with all core components together - **Consistency:** All layer types, network composition, and tensor operations in core.layers - **Integration:** Works seamlessly with tensors and activations for complete neural networks """ # %% [markdown] """ # Matrix Multiplication - The Heart of Neural Networks Every neural network operation ultimately reduces to matrix multiplication. Let's build the foundation that powers everything from simple perceptrons to transformers. ## Why Matrix Multiplication Matters ๐Ÿง  **Neural Network Core**: Every layer applies: output = input @ weights + bias โšก **Parallel Processing**: Matrix ops utilize vectorized CPU instructions and GPU parallelism ๐Ÿ—๏ธ **Scalable Architecture**: Stacking matrix operations creates arbitrarily complex function approximators ๐Ÿ“ˆ **Performance Critical**: 90%+ of neural network compute time is spent in matrix multiplication ## Learning Objectives By implementing matrix multiplication, you'll understand: - How neural networks transform data through linear algebra - Why matrix operations are the building blocks of all modern ML frameworks - How proper implementation affects performance by orders of magnitude - The connection between mathematical operations and computational efficiency """ # %% nbgrader={"grade": false, "grade_id": "matmul-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export def matmul(a: Tensor, b: Tensor) -> Tensor: """ Matrix multiplication for tensors using explicit loops. This implementation uses triple-nested loops for educational understanding of the fundamental operations. Module 15 will show the optimization progression from loops โ†’ blocking โ†’ vectorized operations. Args: a: Left tensor (shape: ..., m, k) b: Right tensor (shape: ..., k, n) Returns: Result tensor (shape: ..., m, n) TODO: Implement matrix multiplication using explicit loops. STEP-BY-STEP IMPLEMENTATION: 1. Extract numpy arrays from both tensors using .data 2. Check tensor shapes for compatibility 3. Use triple-nested loops to show every operation 4. Wrap result in a new Tensor and return LEARNING CONNECTIONS: - This is the core operation in Dense layers: output = input @ weights - Shows the fundamental computation before optimization - Module 15 will demonstrate the progression to high-performance implementations - Understanding loops helps appreciate vectorization and GPU parallelization EDUCATIONAL APPROACH: - Intentionally simple for understanding, not performance - Makes every multiply-add operation explicit - Sets up Module 15 to show optimization techniques EXAMPLE: ```python a = Tensor([[1, 2], [3, 4]]) # shape (2, 2) b = Tensor([[5, 6], [7, 8]]) # shape (2, 2) result = matmul(a, b) # result.data = [[19, 22], [43, 50]] ``` IMPLEMENTATION HINTS: - Use explicit loops to show every operation - This is educational, not optimized for performance - Module 15 will show the progression to fast implementations """ ### BEGIN SOLUTION # Extract numpy arrays from tensors a_data = a.data b_data = b.data # Get dimensions and validate compatibility if len(a_data.shape) != 2 or len(b_data.shape) != 2: raise ValueError("matmul requires 2D tensors") m, k = a_data.shape k2, n = b_data.shape if k != k2: raise ValueError(f"Inner dimensions must match: {k} != {k2}") # Initialize result matrix result = np.zeros((m, n), dtype=a_data.dtype) # Triple nested loops - educational, shows every operation # This is intentionally simple to understand the fundamental computation # Module 15 will show the optimization journey: # Step 1 (here): Educational loops - slow but clear # Step 2: Loop blocking for cache efficiency # Step 3: Vectorized operations with NumPy # Step 4: GPU acceleration and BLAS libraries for i in range(m): # For each row in result for j in range(n): # For each column in result for k_idx in range(k): # Dot product: sum over inner dimension result[i, j] += a_data[i, k_idx] * b_data[k_idx, j] # Return new Tensor with result return Tensor(result) ### END SOLUTION # %% [markdown] """ ## Testing Matrix Multiplication Let's verify our matrix multiplication works correctly with some test cases. """ # %% nbgrader={"grade": true, "grade_id": "test-matmul", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false} def test_matmul(): """Test matrix multiplication implementation.""" print("๐Ÿงช Testing Matrix Multiplication...") # Test case 1: Simple 2x2 matrices a = Tensor([[1, 2], [3, 4]]) b = Tensor([[5, 6], [7, 8]]) result = matmul(a, b) expected = np.array([[19, 22], [43, 50]]) assert np.allclose(result.data, expected), f"Expected {expected}, got {result.data}" print("โœ… 2x2 matrix multiplication") # Test case 2: Non-square matrices a = Tensor([[1, 2, 3], [4, 5, 6]]) # 2x3 b = Tensor([[7, 8], [9, 10], [11, 12]]) # 3x2 result = matmul(a, b) expected = np.array([[58, 64], [139, 154]]) assert np.allclose(result.data, expected), f"Expected {expected}, got {result.data}" print("โœ… Non-square matrix multiplication") # Test case 3: Vector-matrix multiplication a = Tensor([[1, 2, 3]]) # 1x3 (row vector) b = Tensor([[4], [5], [6]]) # 3x1 (column vector) result = matmul(a, b) expected = np.array([[32]]) # 1*4 + 2*5 + 3*6 = 32 assert np.allclose(result.data, expected), f"Expected {expected}, got {result.data}" print("โœ… Vector-matrix multiplication") print("๐ŸŽ‰ All matrix multiplication tests passed!") test_matmul() # %% [markdown] """ # Dense Layer - The Fundamental Neural Network Component Dense layers (also called Linear or Fully Connected layers) are the building blocks of neural networks. They apply the transformation: **output = input @ weights + bias** ## Why Dense Layers Matter ๐Ÿง  **Universal Function Approximators**: Dense layers can approximate any continuous function when stacked ๐Ÿ”ง **Parameter Learning**: Weights and biases are learned through backpropagation ๐Ÿ—๏ธ **Modular Design**: Dense layers compose into complex architectures (MLPs, transformers, etc.) โšก **Computational Efficiency**: Matrix operations leverage optimized linear algebra libraries ## Learning Objectives By implementing Dense layers, you'll understand: - How neural networks learn through adjustable parameters - The mathematical foundation underlying all neural network layers - Why proper parameter initialization is crucial for training success - How layer composition enables complex function approximation """ # %% nbgrader={"grade": false, "grade_id": "dense-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export class Linear(Module): """ Linear (Fully Connected) Layer implementation. Applies the transformation: output = input @ weights + bias Inherits from Module for automatic parameter management and clean API. This is PyTorch's nn.Linear equivalent with the same name for familiarity. Features: - Automatic parameter registration (weights and bias) - Clean call interface: layer(input) instead of layer.forward(input) - Works with optimizers via model.parameters() """ def __init__(self, input_size: int, output_size: int, use_bias: bool = True): """ Initialize Linear layer with random weights and optional bias. Args: input_size: Number of input features output_size: Number of output features use_bias: Whether to include bias term TODO: Implement Linear layer initialization. STEP-BY-STEP IMPLEMENTATION: 1. Store input_size and output_size as instance variables 2. Initialize weights as Tensor with shape (input_size, output_size) 3. Use small random values: np.random.randn(...) * 0.1 4. Initialize bias as Tensor with shape (output_size,) if use_bias is True 5. Set bias to None if use_bias is False LEARNING CONNECTIONS: - Small random initialization prevents symmetry breaking - Weight shape (input_size, output_size) enables matrix multiplication - Bias allows shifting the output (like y-intercept in linear regression) - PyTorch uses more sophisticated initialization (Xavier, Kaiming) IMPLEMENTATION HINTS: - Use np.random.randn() for Gaussian random numbers - Scale by 0.1 to keep initial values small - Remember to wrap numpy arrays in Tensor() - Store use_bias flag for forward pass logic """ ### BEGIN SOLUTION super().__init__() # Initialize Module base class self.input_size = input_size self.output_size = output_size self.use_bias = use_bias # Initialize weights with small random values using Parameter # Shape: (input_size, output_size) for matrix multiplication weight_data = np.random.randn(input_size, output_size) * 0.1 self.weights = Parameter(weight_data) # Auto-registers for optimization! # Initialize bias if requested if use_bias: bias_data = np.random.randn(output_size) * 0.1 self.bias = Parameter(bias_data) # Auto-registers for optimization! else: self.bias = None ### END SOLUTION def forward(self, x: Tensor) -> Tensor: """ Forward pass through the Linear layer. Args: x: Input tensor (shape: ..., input_size) Returns: Output tensor (shape: ..., output_size) TODO: Implement forward pass: output = input @ weights + bias STEP-BY-STEP IMPLEMENTATION: 1. Perform matrix multiplication: output = matmul(x, self.weights) 2. If bias exists, add it: output = output + self.bias 3. Return result as Tensor LEARNING CONNECTIONS: - This is the core neural network transformation - Matrix multiplication scales input features to output features - Bias provides offset (like y-intercept in linear equations) - Broadcasting handles different batch sizes automatically IMPLEMENTATION HINTS: - Use the matmul function you implemented above - Handle bias addition with simple + operator - Check if self.bias is not None before adding """ ### BEGIN SOLUTION # Matrix multiplication: input @ weights output = matmul(x, self.weights) # Add bias if it exists if self.bias is not None: output = output + self.bias return output ### END SOLUTION # Backward compatibility alias #| export Dense = Linear # %% [markdown] """ ## Testing Linear Layer Let's verify our Linear layer works correctly with comprehensive tests. The tests use Dense for backward compatibility, but Dense is now an alias for Linear. """ # %% nbgrader={"grade": true, "grade_id": "test-dense", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false} def test_dense_layer(): """Test Dense layer implementation.""" print("๐Ÿงช Testing Dense Layer...") # Test case 1: Basic functionality layer = Dense(input_size=3, output_size=2) input_tensor = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3) output = layer.forward(input_tensor) # Check output shape assert output.shape == (1, 2), f"Expected shape (1, 2), got {output.shape}" print("โœ… Output shape correct") # Test case 2: No bias layer_no_bias = Dense(input_size=2, output_size=3, use_bias=False) assert layer_no_bias.bias is None, "Bias should be None when use_bias=False" print("โœ… No bias option works") # Test case 3: Multiple samples (batch processing) batch_input = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) # Shape: (3, 2) layer_batch = Dense(input_size=2, output_size=2) batch_output = layer_batch.forward(batch_input) assert batch_output.shape == (3, 2), f"Expected shape (3, 2), got {batch_output.shape}" print("โœ… Batch processing works") # Test case 4: Callable interface callable_output = layer_batch(batch_input) assert np.allclose(callable_output.data, batch_output.data), "Callable interface should match forward()" print("โœ… Callable interface works") # Test case 5: Parameter initialization layer_init = Dense(input_size=10, output_size=5) assert layer_init.weights.shape == (10, 5), f"Expected weights shape (10, 5), got {layer_init.weights.shape}" assert layer_init.bias.shape == (5,), f"Expected bias shape (5,), got {layer_init.bias.shape}" # Check that weights are reasonably small (good initialization) assert np.abs(layer_init.weights.data).mean() < 1.0, "Weights should be small for good initialization" print("โœ… Parameter initialization correct") print("๐ŸŽ‰ All Dense layer tests passed!") test_dense_layer() # %% [markdown] """ ## Testing Autograd Integration Now let's test that our Dense layer works correctly with parameter management and module composition. """ # %% nbgrader={"grade": true, "grade_id": "test-dense-parameter-management", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false} def test_dense_parameter_management(): """Test Dense layer parameter management and module composition.""" print("๐Ÿงช Testing Dense Layer Parameter Management...") # Test case 1: Parameter registration layer = Dense(input_size=3, output_size=2) params = layer.parameters() assert len(params) == 2, f"Expected 2 parameters (weights + bias), got {len(params)}" assert layer.weights in params, "Weights should be in parameters list" assert layer.bias in params, "Bias should be in parameters list" print("โœ… Parameter registration works") # Test case 2: Module composition class SimpleNetwork(Module): def __init__(self): super().__init__() self.layer1 = Dense(4, 3) self.layer2 = Dense(3, 2) def forward(self, x): x = self.layer1(x) return self.layer2(x) network = SimpleNetwork() all_params = network.parameters() # Should have 4 parameters: 2 from each layer (weights + bias) assert len(all_params) == 4, f"Expected 4 parameters from network, got {len(all_params)}" print("โœ… Module composition and parameter collection works") # Test case 3: Forward pass through composed network input_tensor = Tensor([[1.0, 2.0, 3.0, 4.0]]) output = network(input_tensor) assert output.shape == (1, 2), f"Expected output shape (1, 2), got {output.shape}" print("โœ… Network forward pass works") # Test case 4: Parameter shapes layer = Dense(input_size=10, output_size=5) assert layer.weights.shape == (10, 5), f"Expected weights shape (10, 5), got {layer.weights.shape}" assert layer.bias.shape == (5,), f"Expected bias shape (5,), got {layer.bias.shape}" print("โœ… Parameter shapes correct") # Test case 5: No bias option layer_no_bias = Dense(input_size=3, output_size=2, use_bias=False) params_no_bias = layer_no_bias.parameters() assert len(params_no_bias) == 1, f"Expected 1 parameter (weights only), got {len(params_no_bias)}" assert layer_no_bias.bias is None, "Bias should be None when use_bias=False" print("โœ… No bias option works") print("๐ŸŽ‰ All Dense layer parameter management tests passed!") test_dense_parameter_management() # %% [markdown] """ # Sequential Network Composition - Building Complete Architectures Now that we have solid layers, let's build the Sequential network that composes layers into complete neural network architectures. This is the foundation for all neural networks from MLPs to complex deep learning models. ## Why Sequential Networks Matter ๐Ÿ—๏ธ **Architecture Foundation**: Sequential is the building block for all neural network architectures ๐Ÿ”„ **Function Composition**: Chain simple functions to create complex behaviors ๐Ÿ“ฆ **Clean Interface**: Write networks as lists of layers - intuitive and maintainable โšก **Production Standard**: Every major framework uses this pattern for neural network construction ## Learning Objectives By implementing Sequential networks, you'll understand: - How function composition enables universal approximation in neural networks - The architectural patterns that power everything from MLPs to transformers - Why clean abstractions matter for building complex systems - How layer composition creates the foundation for all modern deep learning """ # %% nbgrader={"grade": false, "grade_id": "sequential-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export class Sequential(Module): """ Sequential Network: Composes layers in sequence. The most fundamental network architecture that applies layers in order: f(x) = layer_n(...layer_2(layer_1(x))) Inherits from Module for automatic parameter collection from all sub-layers. This enables optimizers to find all parameters automatically. Example Usage: # Create a 3-layer MLP model = Sequential([ Linear(784, 128), ReLU(), Linear(128, 64), ReLU(), Linear(64, 10) ]) # Use the model output = model(input_data) # Clean interface! params = model.parameters() # All parameters from all layers! """ def __init__(self, layers=None): """ Initialize Sequential network with layers. Args: layers: List of layers to compose in order (optional) """ super().__init__() # Initialize Module base class self.layers = layers if layers is not None else [] # Register all layers as sub-modules for parameter collection for i, layer in enumerate(self.layers): # This automatically adds each layer to self._modules setattr(self, f'layer_{i}', layer) def forward(self, x): """ Forward pass through all layers in sequence. Args: x: Input tensor Returns: Output tensor after passing through all layers """ for layer in self.layers: x = layer(x) return x def add(self, layer): """Add a layer to the network.""" self.layers.append(layer) # Register the new layer for parameter collection setattr(self, f'layer_{len(self.layers)-1}', layer) # %% [markdown] """ ## Testing Sequential Networks Let's verify our Sequential network works correctly with comprehensive tests. """ # %% nbgrader={"grade": true, "grade_id": "test-sequential", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false} def test_sequential_network(): """Test Sequential network implementation.""" print("๐Ÿงช Testing Sequential Network...") # Test case 1: Create empty network empty_net = Sequential() assert len(empty_net.layers) == 0, "Empty Sequential should have no layers" print("โœ… Empty Sequential network creation") # Test case 2: Create network with layers layers = [Dense(3, 4), Dense(4, 2)] network = Sequential(layers) assert len(network.layers) == 2, "Network should have 2 layers" print("โœ… Sequential network with layers") # Test case 3: Forward pass through network input_tensor = Tensor([[1.0, 2.0, 3.0]]) output = network(input_tensor) assert output.shape == (1, 2), f"Expected output shape (1, 2), got {output.shape}" print("โœ… Forward pass through Sequential network") # Test case 4: Parameter collection from all layers all_params = network.parameters() # Should have 4 parameters: 2 weights + 2 biases from 2 Dense layers assert len(all_params) == 4, f"Expected 4 parameters from Sequential network, got {len(all_params)}" print("โœ… Parameter collection from all layers") # Test case 5: Adding layers dynamically network.add(Dense(2, 1)) assert len(network.layers) == 3, "Network should have 3 layers after adding one" # Test forward pass after adding layer final_output = network(input_tensor) assert final_output.shape == (1, 1), f"Expected final output shape (1, 1), got {final_output.shape}" print("โœ… Dynamic layer addition") print("๐ŸŽ‰ All Sequential network tests passed!") test_sequential_network() # %% [markdown] """ # Flatten Operation - Connecting Different Layer Types The Flatten operation is essential for connecting convolutional layers to dense layers, or reshaping tensors between different network components. This is a fundamental operation in neural networks. ## Why Flatten Matters ๐Ÿ”— **Interface Bridge**: Connects spatial layers (Conv2D) to dense layers (Linear) ๐Ÿ“ **Dimension Management**: Converts multi-dimensional tensors to vectors for different layer types ๐Ÿ—๏ธ **Architecture Flexibility**: Enables mixing different layer types in the same network โšก **Memory Efficiency**: Provides clean tensor reshaping without copying data ## Learning Objectives By implementing Flatten, you'll understand: - How neural networks handle tensors of different shapes between layer types - The critical role of tensor reshaping in network architecture design - How to preserve batch dimensions while flattening spatial dimensions - The connection between memory layout and computational efficiency """ # %% nbgrader={"grade": false, "grade_id": "flatten-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export class Flatten(Module): """ Flatten layer that reshapes tensors from multi-dimensional to 2D. Essential for connecting convolutional layers (which output 4D tensors) to linear layers (which expect 2D tensors). Preserves the batch dimension. Example Usage: # In a CNN architecture model = Sequential([ Conv2D(3, 16, kernel_size=3), # Output: (batch, 16, height, width) ReLU(), Flatten(), # Output: (batch, 16*height*width) Linear(16*height*width, 10) # Now compatible! ]) """ def __init__(self, start_dim=1): """ Initialize Flatten layer. Args: start_dim: Dimension to start flattening from (default: 1 to preserve batch) """ super().__init__() self.start_dim = start_dim def forward(self, x): """ Flatten tensor starting from start_dim. Args: x: Input tensor Returns: Flattened tensor with batch dimension preserved """ return flatten(x, start_dim=self.start_dim) # %% nbgrader={"grade": false, "grade_id": "flatten-function", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export def flatten(x, start_dim=1): """ Flatten tensor starting from a given dimension. This is essential for transitioning from convolutional layers (which output 4D tensors) to linear layers (which expect 2D). Args: x: Input tensor (Tensor or any array-like) start_dim: Dimension to start flattening from (default: 1 to preserve batch) Returns: Flattened tensor preserving batch dimension Examples: # Flatten CNN output for Linear layer conv_output = Tensor(np.random.randn(32, 64, 8, 8)) # (batch, channels, height, width) flat = flatten(conv_output) # (32, 4096) - ready for Linear layer! # Flatten image for MLP images = Tensor(np.random.randn(32, 3, 28, 28)) # CIFAR-10 batch flat = flatten(images) # (32, 2352) - ready for MLP! """ # Get the data (handle both Tensor and numpy arrays) if hasattr(x, 'data'): data = x.data else: data = x # Calculate new shape batch_size = data.shape[0] if start_dim > 0 else 1 remaining_size = np.prod(data.shape[start_dim:]) new_shape = (batch_size, remaining_size) if start_dim > 0 else (remaining_size,) # Reshape preserving tensor type if hasattr(x, 'data'): # It's a Tensor - preserve type flattened_data = data.reshape(new_shape) return type(x)(flattened_data) else: # It's a numpy array return data.reshape(new_shape) # %% [markdown] """ ## Testing Flatten Operations Let's verify our Flatten implementation works correctly with various tensor shapes. """ # %% nbgrader={"grade": true, "grade_id": "test-flatten", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false} def test_flatten_operations(): """Test Flatten layer and function implementation.""" print("๐Ÿงช Testing Flatten Operations...") # Test case 1: Flatten function with 2D tensor x_2d = Tensor([[1, 2], [3, 4]]) flattened_func = flatten(x_2d) assert flattened_func.shape == (2, 2), f"Expected shape (2, 2), got {flattened_func.shape}" print("โœ… Flatten function with 2D tensor") # Test case 2: Flatten function with 4D tensor (simulating CNN output) x_4d = Tensor(np.random.randn(2, 3, 4, 4)) # (batch, channels, height, width) flattened_4d = flatten(x_4d) assert flattened_4d.shape == (2, 48), f"Expected shape (2, 48), got {flattened_4d.shape}" # 3*4*4 = 48 print("โœ… Flatten function with 4D tensor") # Test case 3: Flatten layer class flatten_layer = Flatten() layer_output = flatten_layer(x_4d) assert layer_output.shape == (2, 48), f"Expected shape (2, 48), got {layer_output.shape}" assert np.allclose(layer_output.data, flattened_4d.data), "Flatten layer should match flatten function" print("โœ… Flatten layer class") # Test case 4: Different start dimensions flatten_from_0 = Flatten(start_dim=0) full_flat = flatten_from_0(x_2d) assert len(full_flat.shape) <= 2, "Flattening from dim 0 should create vector" print("โœ… Different start dimensions") # Test case 5: Integration with Sequential network = Sequential([ Dense(8, 4), Flatten() ]) test_input = Tensor(np.random.randn(2, 8)) output = network(test_input) assert output.shape == (2, 4), f"Expected shape (2, 4), got {output.shape}" print("โœ… Flatten integration with Sequential") print("๐ŸŽ‰ All Flatten operations tests passed!") test_flatten_operations() # %% [markdown] """ # Systems Analysis: Memory and Performance Characteristics Let's analyze the memory usage and computational complexity of our layer implementations. ## Memory Analysis - **Dense Layer Storage**: input_size ร— output_size weights + output_size bias terms - **Forward Pass Memory**: Input tensor + weight tensor + output tensor (temporary storage) - **Scaling Behavior**: Memory grows quadratically with layer size ## Computational Complexity - **Matrix Multiplication**: O(batch_size ร— input_size ร— output_size) - **Bias Addition**: O(batch_size ร— output_size) - **Total**: Dominated by matrix multiplication for large layers ## Production Insights In production ML systems: - **Memory Management**: PyTorch uses memory pools to avoid frequent allocation/deallocation - **Compute Optimization**: BLAS libraries (MKL, OpenBLAS) optimize matrix operations for specific hardware - **GPU Acceleration**: CUDA kernels parallelize matrix operations across thousands of cores - **Mixed Precision**: Using float16 instead of float32 can halve memory usage with minimal accuracy loss """ # %% nbgrader={"grade": false, "grade_id": "memory-analysis", "locked": false, "schema_version": 3, "solution": false, "task": false} def analyze_layer_memory(): """Analyze memory usage of different layer sizes.""" print("๐Ÿ“Š Layer Memory Analysis") print("=" * 40) layer_sizes = [(10, 10), (100, 100), (1000, 1000), (784, 128), (128, 10)] for input_size, output_size in layer_sizes: # Calculate parameter count weight_params = input_size * output_size bias_params = output_size total_params = weight_params + bias_params # Calculate memory usage (assuming float32 = 4 bytes) memory_mb = total_params * 4 / (1024 * 1024) print(f" {input_size:4d} โ†’ {output_size:4d}: {total_params:,} params, {memory_mb:.3f} MB") print("\n๐Ÿ” Key Insights:") print(" โ€ข Memory grows quadratically with layer width") print(" โ€ข Large layers (1000ร—1000) use significant memory") print(" โ€ข Modern networks balance width vs depth for efficiency") analyze_layer_memory() # %% [markdown] """ # ML Systems Thinking: Interactive Questions Let's explore the deeper implications of our layer implementations. """ # %% nbgrader={"grade": false, "grade_id": "systems-thinking", "locked": false, "schema_version": 3, "solution": false, "task": false} def explore_layer_scaling(): """Explore how layer operations scale with size.""" print("๐Ÿค” Scaling Analysis: Matrix Multiplication Performance") print("=" * 55) sizes = [64, 128, 256, 512] for size in sizes: # Estimate FLOPs for square matrix multiplication flops = 2 * size * size * size # 2 operations per multiply-add # Estimate memory bandwidth (reading A, B, writing C) memory_ops = 3 * size * size # Elements read/written memory_mb = memory_ops * 4 / (1024 * 1024) # float32 = 4 bytes print(f" Size {size:3d}ร—{size:3d}: {flops/1e6:.1f} MFLOPS, {memory_mb:.2f} MB transfers") print("\n๐Ÿ’ก Performance Insights:") print(" โ€ข FLOPs grow cubically (O(nยณ)) with matrix size") print(" โ€ข Memory bandwidth grows quadratically (O(nยฒ))") print(" โ€ข Large matrices become memory-bound, not compute-bound") print(" โ€ข This is why GPUs excel: high memory bandwidth + parallel compute") explore_layer_scaling() # %% [markdown] """ # Complete Neural Network Demo - All Components Working Together Let's demonstrate how all our components work together to build complete neural networks. """ # %% nbgrader={"grade": false, "grade_id": "complete-network-demo", "locked": false, "schema_version": 3, "solution": false, "task": false} def demonstrate_complete_networks(): """Demonstrate complete neural networks using all implemented components.""" print("๐Ÿ”ฅ Complete Neural Network Demo") print("=" * 50) print("\n1. MLP for Classification (MNIST-style):") # Multi-layer perceptron for image classification mlp = Sequential([ Flatten(), # Flatten input images Linear(784, 256), # First hidden layer Linear(256, 128), # Second hidden layer Linear(128, 10) # Output layer (10 classes) ]) # Test with batch of "images" batch_images = Tensor(np.random.randn(32, 28, 28)) # 32 MNIST-like images mlp_output = mlp(batch_images) print(f" Input: {batch_images.shape} (batch of 28x28 images)") print(f" Output: {mlp_output.shape} (class logits for 32 images)") print(f" Parameters: {len(mlp.parameters())} tensors") print("\n2. CNN-style Architecture (with Flatten):") # Simulate CNN โ†’ Flatten โ†’ Dense pattern cnn_style = Sequential([ # Simulate Conv2D output with random "features" Flatten(), # Flatten spatial features Linear(512, 256), # Dense layer after convolution Linear(256, 10) # Classification head ]) # Test with simulated conv output conv_features = Tensor(np.random.randn(16, 8, 8, 8)) # Simulated (B,C,H,W) cnn_output = cnn_style(conv_features) print(f" Input: {conv_features.shape} (simulated conv features)") print(f" Output: {cnn_output.shape} (class predictions)") print("\n3. Deep Network with Many Layers:") # Demonstrate deep composition deep_net = Sequential() layer_sizes = [100, 80, 60, 40, 20, 10] for i in range(len(layer_sizes) - 1): deep_net.add(Linear(layer_sizes[i], layer_sizes[i+1])) print(f" Added layer: {layer_sizes[i]} โ†’ {layer_sizes[i+1]}") # Test deep network deep_input = Tensor(np.random.randn(8, 100)) deep_output = deep_net(deep_input) print(f" Deep network: {deep_input.shape} โ†’ {deep_output.shape}") print(f" Total parameters: {len(deep_net.parameters())} tensors") print("\n4. Parameter Management Across Networks:") networks = {'MLP': mlp, 'CNN-style': cnn_style, 'Deep': deep_net} for name, net in networks.items(): params = net.parameters() total_params = sum(p.data.size for p in params) memory_mb = total_params * 4 / (1024 * 1024) # float32 = 4 bytes print(f" {name}: {len(params)} param tensors, {total_params:,} total params, {memory_mb:.2f} MB") print("\n๐ŸŽ‰ All components work together seamlessly!") print(" โ€ข Module system enables automatic parameter collection") print(" โ€ข Linear layers handle matrix transformations") print(" โ€ข Sequential composes layers into complete architectures") print(" โ€ข Flatten connects different layer types") print(" โ€ข Everything integrates for production-ready neural networks!") demonstrate_complete_networks() # %% [markdown] """ ## ๐Ÿค” ML Systems Thinking: Interactive Questions Now that you've implemented all the core neural network components, let's think about their implications for ML systems: """ # %% nbgrader={"grade": false, "grade_id": "question-1", "locked": false, "schema_version": 3, "solution": false, "task": false} # Question 1: Memory vs Computation Trade-offs """ ๐Ÿค” **Question 1: Memory vs Computation Analysis** You're designing a neural network for deployment on a mobile device with limited memory (1GB RAM) but decent compute power. You have two architecture options: A) Wide network: 784 โ†’ 2048 โ†’ 2048 โ†’ 10 (3 layers, wide) B) Deep network: 784 โ†’ 256 โ†’ 256 โ†’ 256 โ†’ 256 โ†’ 10 (5 layers, narrow) Calculate the memory requirements for each option and explain which you'd choose for mobile deployment and why. Consider: - Parameter storage requirements - Intermediate activation storage during forward pass - Training vs inference memory requirements - How your choice affects model capacity and accuracy """ # %% nbgrader={"grade": false, "grade_id": "question-2", "locked": false, "schema_version": 3, "solution": false, "task": false} # Question 2: Performance Optimization """ ๐Ÿค” **Question 2: Production Performance Optimization** Your Dense layer implementation works correctly, but you notice it's slower than PyTorch's nn.Linear on the same hardware. Investigate and explain: 1. Why might our implementation be slower? (Hint: think about underlying linear algebra libraries) 2. What optimization techniques do production frameworks use? 3. How would you modify our implementation to approach production performance? 4. When might our simple implementation actually be preferable? Research areas to consider: - BLAS (Basic Linear Algebra Subprograms) libraries - Memory layout and cache efficiency - Vectorization and SIMD instructions - GPU kernel optimization """ # %% nbgrader={"grade": false, "grade_id": "question-3", "locked": false, "schema_version": 3, "solution": false, "task": false} # Question 3: Scaling and Architecture Design """ ๐Ÿค” **Question 3: Systems Architecture Scaling** Modern transformer models like GPT-3 have billions of parameters, primarily in Dense layers. Analyze the scaling challenges: 1. How does memory requirement scale with model size? Calculate the memory needed for a 175B parameter model. 2. What are the computational bottlenecks during training vs inference? 3. How do systems like distributed training address these scaling challenges? 4. Why do large models use techniques like gradient checkpointing and model parallelism? Systems considerations: - Memory hierarchy (L1/L2/L3 cache, RAM, storage) - Network bandwidth for distributed training - GPU memory constraints and model sharding - Inference optimization for production serving """ # %% [markdown] """ # Comprehensive Testing and Integration Let's run a comprehensive test suite to verify all our implementations work correctly together. """ # %% nbgrader={"grade": false, "grade_id": "comprehensive-tests", "locked": false, "schema_version": 3, "solution": false, "task": false} def run_comprehensive_tests(): """Run comprehensive tests of all layer functionality.""" print("๐Ÿ”ฌ Comprehensive Layer Testing Suite") print("=" * 45) # Test 1: Matrix multiplication edge cases print("\n1. Matrix Multiplication Edge Cases:") # Single element a = Tensor([[5]]) b = Tensor([[3]]) result = matmul(a, b) assert result.data[0, 0] == 15, "Single element multiplication failed" print(" โœ… Single element multiplication") # Identity matrix identity = Tensor([[1, 0], [0, 1]]) test_matrix = Tensor([[2, 3], [4, 5]]) result = matmul(test_matrix, identity) assert np.allclose(result.data, test_matrix.data), "Identity multiplication failed" print(" โœ… Identity matrix multiplication") # Test 2: Dense layer composition print("\n2. Dense Layer Composition:") # Create a simple 2-layer network layer1 = Dense(4, 3) layer2 = Dense(3, 2) # Test data flow input_data = Tensor([[1, 2, 3, 4]]) hidden = layer1(input_data) output = layer2(hidden) assert output.shape == (1, 2), f"Expected final output shape (1, 2), got {output.shape}" print(" โœ… Multi-layer composition") # Test 3: Batch processing print("\n3. Batch Processing:") batch_size = 10 batch_input = Tensor(np.random.randn(batch_size, 4)) batch_hidden = layer1(batch_input) batch_output = layer2(batch_hidden) assert batch_output.shape == (batch_size, 2), f"Expected batch output shape ({batch_size}, 2), got {batch_output.shape}" print(" โœ… Batch processing") # Test 4: Parameter access and modification print("\n4. Parameter Management:") layer = Dense(5, 3) original_weights = layer.weights.data.copy() # Simulate parameter update layer.weights = Tensor(original_weights + 0.1) assert not np.allclose(layer.weights.data, original_weights), "Parameter update failed" print(" โœ… Parameter modification") print("\n๐ŸŽ‰ All comprehensive tests passed!") print(" Your layer implementations are ready for neural network construction!") run_comprehensive_tests() # %% [markdown] """ ## Autograd Integration Demo Let's demonstrate how layers compose together to build neural networks. """ # %% nbgrader={"grade": false, "grade_id": "layer-composition-demo", "locked": false, "schema_version": 3, "solution": false, "task": false} def demonstrate_layer_composition(): """Demonstrate how layers compose into neural networks.""" print("๐Ÿ”ฅ Layer Composition Demo") print("=" * 50) print("\n1. Creating individual layers:") layer1 = Dense(input_size=4, output_size=3) layer2 = Dense(input_size=3, output_size=2) print(f" Layer 1: {layer1.input_size} โ†’ {layer1.output_size}") print(f" Layer 2: {layer2.input_size} โ†’ {layer2.output_size}") print("\n2. Manual layer composition:") input_data = Tensor([[1.0, 2.0, 3.0, 4.0]]) print(f" Input shape: {input_data.shape}") # Forward pass through each layer hidden = layer1(input_data) print(f" After layer 1: {hidden.shape}") output = layer2(hidden) print(f" Final output: {output.shape}") print(f" Output values: {output.data.tolist()}") print("\n3. Creating a composed network class:") class TwoLayerNetwork(Module): def __init__(self, input_size, hidden_size, output_size): super().__init__() self.layer1 = Dense(input_size, hidden_size) self.layer2 = Dense(hidden_size, output_size) def forward(self, x): x = self.layer1(x) return self.layer2(x) network = TwoLayerNetwork(4, 3, 2) network_output = network(input_data) print(f" Network output shape: {network_output.shape}") print(f" Network output values: {network_output.data.tolist()}") print("\n4. Parameter management:") all_params = network.parameters() print(f" Total parameters: {len(all_params)}") total_param_count = 0 for i, param in enumerate(all_params): total_param_count += param.data.size print(f" Parameter {i}: shape {param.shape}, size {param.data.size}") print(f" Total parameter count: {total_param_count:,}") print("\n5. Batch processing:") batch_input = Tensor([[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]]) batch_output = network(batch_input) print(f" Batch input shape: {batch_input.shape}") print(f" Batch output shape: {batch_output.shape}") print(" โœ… Batch processing works automatically!") print("\n๐ŸŽ‰ Layer composition enables building complex neural networks!") print(" โ€ข Individual layers are building blocks") print(" โ€ข Module class enables clean composition") print(" โ€ข Parameter management happens automatically") print(" โ€ข Batch processing scales to multiple samples") demonstrate_layer_composition() # %% [markdown] """ ## ๐ŸŽฏ MODULE SUMMARY: Layers - Complete Neural Network Foundation ## ๐ŸŽฏ What You've Accomplished You've successfully implemented the complete foundation for neural networks - all the essential components working together: ### โœ… **Complete Core System** - **Module Base Class**: Parameter management and composition patterns for all neural network components - **Matrix Multiplication**: The computational primitive underlying all neural network operations - **Linear (Dense) Layers**: Complete implementation with proper parameter initialization and forward propagation - **Sequential Networks**: Clean composition system for building complete neural network architectures - **Flatten Operations**: Tensor reshaping to connect different layer types (essential for CNNโ†’MLP transitions) ### โœ… **Systems Understanding** - **Architectural Patterns**: How modular design enables everything from MLPs to complex deep networks - **Memory Analysis**: How layer composition affects memory usage and computational efficiency - **Performance Characteristics**: Understanding how tensor operations and layer composition affect performance - **Production Context**: Connection to real-world ML frameworks and their component organization ### โœ… **ML Engineering Skills** - **Complete Parameter Management**: How neural networks automatically collect parameters from all components - **Network Composition**: Building complex architectures from simple, reusable components - **Tensor Operations**: Essential reshaping and transformation operations for different network types - **Clean Abstraction**: Professional software design patterns that scale to production systems ## ๐Ÿ”— **Connection to Production ML Systems** Your unified implementation mirrors the complete component systems used in: - **PyTorch's nn.Module system**: Same parameter management and composition patterns - **PyTorch's nn.Sequential**: Identical architecture composition approach - **All major frameworks**: The same modular design principles that power TensorFlow, JAX, and others - **Production ML systems**: Clean abstractions that enable complex models while maintaining manageable code ## ๐Ÿš€ **What's Next** With your complete layer foundation, you're ready to: - **Add nonlinear activations** to enable complex function approximation - **Implement loss functions** to define learning objectives - **Build training algorithms** to optimize networks on data - **Create specialized layers** like convolutions for computer vision ## ๐Ÿ’ก **Key Systems Insights** 1. **Modular composition is the key to scalable ML systems** - clean interfaces enable complex behaviors 2. **Parameter management must be automatic** - manual parameter tracking doesn't scale to deep networks 3. **Tensor operations like flattening are architectural requirements** - different layer types need different tensor shapes 4. **Clean abstractions enable innovation** - good foundational design supports unlimited architectural experimentation You now understand how to build complete, production-ready neural network foundations that can scale to any architecture! """ # %% nbgrader={"grade": false, "grade_id": "final-demo", "locked": false, "schema_version": 3, "solution": false, "task": false} if __name__ == "__main__": print("๐Ÿ”ฅ TinyTorch Layers Module - Complete Foundation Demo") print("=" * 60) # Test all core components print("\n๐Ÿงช Testing All Core Components:") test_matmul() test_dense_layer() test_dense_parameter_management() test_sequential_network() test_flatten_operations() print("\n" + "="*60) demonstrate_complete_networks() print("\n" + "="*60) demonstrate_layer_composition() print("\n๐ŸŽ‰ Complete neural network foundation ready!") print(" โœ… Module system for parameter management") print(" โœ… Linear layers for transformations") print(" โœ… Sequential networks for composition") print(" โœ… Flatten operations for tensor reshaping") print(" โœ… All components tested and integrated!")