# --- # jupyter: # jupytext: # text_representation: # extension: .py # format_name: percent # format_version: '1.3' # jupytext_version: 1.17.1 # --- # %% [markdown] """ # Module 2: Layers - Neural Network Building Blocks Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors. ## Learning Goals - Understand layers as functions that transform tensors: `y = f(x)` - Implement Dense layers with linear transformations: `y = Wx + b` - Use activation functions from the activations module for nonlinearity - See how neural networks are just function composition - Build intuition before diving into training ## Build → Use → Understand 1. **Build**: Dense layers using activation functions as building blocks 2. **Use**: Transform tensors and see immediate results 3. **Understand**: How neural networks transform information ## Module Dependencies This module builds on the **activations** module: - **activations** → **layers** → **networks** - Clean separation of concerns: math functions → layer building blocks → full networks ## Module → Package Structure **🎓 Teaching vs. 🔧 Building**: - **Learning side**: Work in `modules/layers/layers_dev.py` - **Building side**: Exports to `tinytorch/core/layers.py` This module builds the fundamental transformations that compose into neural networks. """ # %% #| default_exp core.layers # Setup and imports import numpy as np import sys from typing import Union, Optional, Callable import math # %% #| export import numpy as np import math import sys from typing import Union, Optional, Callable from tinytorch.core.tensor import Tensor # Import activation functions from the activations module from tinytorch.core.activations import ReLU, Sigmoid, Tanh # Import our Tensor class # sys.path.append('../../') # from modules.tensor.tensor_dev import Tensor # print("🔥 TinyTorch Layers Module") # print(f"NumPy version: {np.__version__}") # print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") # print("Ready to build neural network layers!") # %% [markdown] """ ## Step 1: What is a Layer? A **layer** is a function that transforms tensors. Think of it as: - **Input**: Tensor with some shape - **Transformation**: Mathematical operation (linear, nonlinear, etc.) - **Output**: Tensor with possibly different shape **The fundamental insight**: Neural networks are just function composition! ``` x → Layer1 → Layer2 → Layer3 → y ``` **Why layers matter**: - They're the building blocks of all neural networks - Each layer learns a different transformation - Composing layers creates complex functions - Understanding layers = understanding neural networks Let's start with the most important layer: **Dense** (also called Linear or Fully Connected). """ # %% #| export class Dense: """ Dense (Linear) Layer: y = Wx + b The fundamental building block of neural networks. Performs linear transformation: matrix multiplication + bias addition. Args: input_size: Number of input features output_size: Number of output features use_bias: Whether to include bias term (default: True) TODO: Implement the Dense layer with weight initialization and forward pass. """ def __init__(self, input_size: int, output_size: int, use_bias: bool = True): """ Initialize Dense layer with random weights. TODO: 1. Store layer parameters (input_size, output_size, use_bias) 2. Initialize weights with small random values 3. Initialize bias to zeros (if use_bias=True) """ raise NotImplementedError("Student implementation required") def forward(self, x: Tensor) -> Tensor: """ Forward pass: y = Wx + b Args: x: Input tensor of shape (batch_size, input_size) Returns: Output tensor of shape (batch_size, output_size) TODO: Implement matrix multiplication and bias addition """ raise NotImplementedError("Student implementation required") def __call__(self, x: Tensor) -> Tensor: """Make layer callable: layer(x) same as layer.forward(x)""" return self.forward(x) # %% #| hide #| export class Dense: """ Dense (Linear) Layer: y = Wx + b The fundamental building block of neural networks. Performs linear transformation: matrix multiplication + bias addition. """ def __init__(self, input_size: int, output_size: int, use_bias: bool = True): """Initialize Dense layer with random weights.""" self.input_size = input_size self.output_size = output_size self.use_bias = use_bias # Initialize weights with Xavier/Glorot initialization # This helps with gradient flow during training limit = math.sqrt(6.0 / (input_size + output_size)) self.weights = Tensor( np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32) ) # Initialize bias to zeros if use_bias: self.bias = Tensor(np.zeros(output_size, dtype=np.float32)) else: self.bias = None def forward(self, x: Tensor) -> Tensor: """Forward pass: y = Wx + b""" # Matrix multiplication: x @ weights # x shape: (batch_size, input_size) # weights shape: (input_size, output_size) # result shape: (batch_size, output_size) output = Tensor(x.data @ self.weights.data) # Add bias if present if self.bias is not None: output = Tensor(output.data + self.bias.data) return output def __call__(self, x: Tensor) -> Tensor: """Make layer callable: layer(x) same as layer.forward(x)""" return self.forward(x) # %% [markdown] """ ### 🧪 Test Your Dense Layer Once you implement the Dense layer above, run this cell to test it: """ # %% # Test the Dense layer try: print("=== Testing Dense Layer ===") # Create a simple Dense layer: 3 inputs → 2 outputs layer = Dense(input_size=3, output_size=2) print(f"Created Dense layer: {layer.input_size} → {layer.output_size}") print(f"Weights shape: {layer.weights.shape}") print(f"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}") # Test with a single example x = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3) y = layer(x) print(f"Input shape: {x.shape}") print(f"Output shape: {y.shape}") print(f"Input: {x.data}") print(f"Output: {y.data}") # Test with batch x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # Shape: (2, 3) y_batch = layer(x_batch) print(f"\nBatch input shape: {x_batch.shape}") print(f"Batch output shape: {y_batch.shape}") print("✅ Dense layer working!") except Exception as e: print(f"❌ Error: {e}") print("Make sure to implement the Dense layer above!") # %% [markdown] """ ## Step 2: Activation Functions - Adding Nonlinearity Now we'll use the activation functions from the **activations** module! **Clean Architecture**: We import the activation functions rather than redefining them: ```python from tinytorch.core.activations import ReLU, Sigmoid, Tanh ``` **Why this matters**: - **Separation of concerns**: Math functions vs. layer building blocks - **Reusability**: Activations can be used anywhere in the system - **Maintainability**: One place to update activation implementations - **Composability**: Clean imports make neural networks easier to build **Why nonlinearity matters**: Without it, stacking layers is pointless! ``` Linear → Linear → Linear = Just one big Linear transformation Linear → NonLinear → Linear = Can learn complex patterns ``` """ # %% [markdown] """ ### 🧪 Test Activation Functions from Activations Module Let's test that we can use the activation functions from the activations module: """ # %% # Test activation functions from activations module try: print("=== Testing Activation Functions from Activations Module ===") # Test data: mix of positive, negative, and zero x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]]) print(f"Input: {x.data}") # Test ReLU from activations module relu = ReLU() y_relu = relu(x) print(f"ReLU output: {y_relu.data}") # Test Sigmoid from activations module sigmoid = Sigmoid() y_sigmoid = sigmoid(x) print(f"Sigmoid output: {y_sigmoid.data}") # Test Tanh from activations module tanh = Tanh() y_tanh = tanh(x) print(f"Tanh output: {y_tanh.data}") print("✅ Activation functions from activations module working!") print("🎉 Clean architecture: layers module uses activations module!") except Exception as e: print(f"❌ Error: {e}") print("Make sure the activations module is properly exported!") # %% [markdown] """ ## Step 3: Layer Composition - Building Neural Networks Now comes the magic! We can **compose** layers to build neural networks: ``` Input → Dense → ReLU → Dense → Sigmoid → Output ``` This is a 2-layer neural network that can learn complex nonlinear patterns! **Notice the clean architecture**: - Dense layers handle linear transformations - Activation functions (from activations module) handle nonlinearity - Composition creates complex behaviors from simple building blocks """ # %% # Build a simple 2-layer neural network try: print("=== Building a 2-Layer Neural Network ===") # Network architecture: 3 → 4 → 2 # Input: 3 features # Hidden: 4 neurons with ReLU # Output: 2 neurons with Sigmoid layer1 = Dense(input_size=3, output_size=4) activation1 = ReLU() # From activations module layer2 = Dense(input_size=4, output_size=2) activation2 = Sigmoid() # From activations module print("Network architecture:") print(f" Input: 3 features") print(f" Hidden: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)") print(f" Output: {layer2.input_size} → {layer2.output_size} (Dense + Sigmoid)") # Test with sample data x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # 2 examples, 3 features each print(f"\nInput shape: {x.shape}") print(f"Input data: {x.data}") # Forward pass through the network h1 = layer1(x) # Dense layer 1 h1_activated = activation1(h1) # ReLU activation h2 = layer2(h1_activated) # Dense layer 2 output = activation2(h2) # Sigmoid activation print(f"\nAfter layer 1: {h1.shape}") print(f"After ReLU: {h1_activated.shape}") print(f"After layer 2: {h2.shape}") print(f"Final output: {output.shape}") print(f"Output values: {output.data}") print("\n🎉 Neural network working! You just built your first neural network!") print("🏗️ Clean architecture: Dense layers + Activations module = Neural Network") print("Notice how the network transforms 3D input into 2D output through learned transformations.") except Exception as e: print(f"❌ Error: {e}") print("Make sure to implement the layers and check activations module!") # %% [markdown] """ ## Step 4: Understanding What We Built Congratulations! You just implemented a clean, modular neural network architecture: ### 🧱 **What You Built** 1. **Dense Layer**: Linear transformation `y = Wx + b` 2. **Activation Functions**: Imported from activations module (ReLU, Sigmoid, Tanh) 3. **Layer Composition**: Chaining layers to build networks ### 🏗️ **Clean Architecture Benefits** - **Separation of concerns**: Math functions vs. layer building blocks - **Reusability**: Activations can be used across different modules - **Maintainability**: One place to update activation implementations - **Composability**: Clean imports make complex networks easier to build ### 🎯 **Key Insights** - **Layers are functions**: They transform tensors from one space to another - **Composition creates complexity**: Simple layers → complex networks - **Nonlinearity is crucial**: Without it, deep networks are just linear transformations - **Neural networks are function approximators**: They learn to map inputs to outputs - **Modular design**: Building blocks can be combined in many ways ### 🚀 **What's Next** In the next modules, you'll learn: - **Training**: How networks learn from data (backpropagation, optimizers) - **Architectures**: Specialized layers for different problems (CNNs, RNNs) - **Applications**: Using networks for real problems ### 🔧 **Export to Package** Run this to export your layers to the TinyTorch package: ```bash python bin/tito.py sync ``` Then test your implementation: ```bash python bin/tito.py test --module layers ``` **Great job! You've built a clean, modular foundation for neural networks!** 🎉 """ # %% # Final demonstration: A more complex example try: print("=== Final Demo: Image Classification Network ===") # Simulate a small image: 28x28 pixels flattened to 784 features # This is like a tiny MNIST digit image_size = 28 * 28 # 784 pixels num_classes = 10 # 10 digits (0-9) # Build a 3-layer network for digit classification # 784 → 128 → 64 → 10 layer1 = Dense(input_size=image_size, output_size=128) relu1 = ReLU() # From activations module layer2 = Dense(input_size=128, output_size=64) relu2 = ReLU() # From activations module layer3 = Dense(input_size=64, output_size=num_classes) softmax = Sigmoid() # Using Sigmoid as a simple "probability-like" output print(f"Image classification network:") print(f" Input: {image_size} pixels (28x28 image)") print(f" Hidden 1: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)") print(f" Hidden 2: {layer2.input_size} → {layer2.output_size} (Dense + ReLU)") print(f" Output: {layer3.input_size} → {layer3.output_size} (Dense + Sigmoid)") # Simulate a batch of 5 images batch_size = 5 fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32)) # Forward pass h1 = relu1(layer1(fake_images)) h2 = relu2(layer2(h1)) predictions = softmax(layer3(h2)) print(f"\nBatch processing:") print(f" Input batch shape: {fake_images.shape}") print(f" Predictions shape: {predictions.shape}") print(f" Sample predictions: {predictions.data[0]}") # First image predictions print("\n🎉 You built a neural network that could classify images!") print("🏗️ Clean architecture: Dense layers + Activations module = Image Classifier") print("With training, this network could learn to recognize handwritten digits!") except Exception as e: print(f"❌ Error: {e}") print("Check your layer implementations and activations module!") # %% [markdown] """ ## 🎓 Module Summary ### What You Learned 1. **Layer Architecture**: Dense layers as linear transformations 2. **Clean Dependencies**: Layers module uses activations module 3. **Function Composition**: Simple building blocks → complex networks 4. **Modular Design**: Separation of concerns for maintainable code ### Key Architectural Insight ``` activations (math functions) → layers (building blocks) → networks (applications) ``` This clean dependency graph makes the system: - **Understandable**: Each module has a clear purpose - **Testable**: Each module can be tested independently - **Reusable**: Components can be used across different contexts - **Maintainable**: Changes are localized to appropriate modules ### Next Steps - **Training**: Learn how networks learn from data - **Advanced Architectures**: CNNs, RNNs, Transformers - **Applications**: Real-world machine learning problems **Congratulations on building a clean, modular neural network foundation!** 🚀 """