Files
TinyTorch/_proc/layers/layers_dev
Vijay Janapa Reddi 82defeafd3 Refactor notebook generation to use separate files for better architecture
- Restored tools/py_to_notebook.py as a focused, standalone tool
- Updated tito notebooks command to use subprocess to call the separate tool
- Maintains clean separation of concerns: tito.py for CLI orchestration, py_to_notebook.py for conversion logic
- Updated documentation to use 'tito notebooks' command instead of direct tool calls
- Benefits: easier debugging, better maintainability, focused single-responsibility modules
2025-07-10 21:57:09 -04:00

469 lines
16 KiB
Plaintext

# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# Module 2: Layers - Neural Network Building Blocks
Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.
## Learning Goals
- Understand layers as functions that transform tensors: `y = f(x)`
- Implement Dense layers with linear transformations: `y = Wx + b`
- Use activation functions from the activations module for nonlinearity
- See how neural networks are just function composition
- Build intuition before diving into training
## Build → Use → Understand
1. **Build**: Dense layers using activation functions as building blocks
2. **Use**: Transform tensors and see immediate results
3. **Understand**: How neural networks transform information
## Module Dependencies
This module builds on the **activations** module:
- **activations** → **layers** → **networks**
- Clean separation of concerns: math functions → layer building blocks → full networks
## Module → Package Structure
**🎓 Teaching vs. 🔧 Building**:
- **Learning side**: Work in `modules/layers/layers_dev.py`
- **Building side**: Exports to `tinytorch/core/layers.py`
This module builds the fundamental transformations that compose into neural networks.
"""
# %%
#| default_exp core.layers
# Setup and imports
import numpy as np
import sys
from typing import Union, Optional, Callable
import math
# %%
#| export
import numpy as np
import math
import sys
from typing import Union, Optional, Callable
from tinytorch.core.tensor import Tensor
# Import activation functions from the activations module
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
# Import our Tensor class
# sys.path.append('../../')
# from modules.tensor.tensor_dev import Tensor
# print("🔥 TinyTorch Layers Module")
# print(f"NumPy version: {np.__version__}")
# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
# print("Ready to build neural network layers!")
# %% [markdown]
"""
## Step 1: What is a Layer?
A **layer** is a function that transforms tensors. Think of it as:
- **Input**: Tensor with some shape
- **Transformation**: Mathematical operation (linear, nonlinear, etc.)
- **Output**: Tensor with possibly different shape
**The fundamental insight**: Neural networks are just function composition!
```
x → Layer1 → Layer2 → Layer3 → y
```
**Why layers matter**:
- They're the building blocks of all neural networks
- Each layer learns a different transformation
- Composing layers creates complex functions
- Understanding layers = understanding neural networks
Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).
"""
# %%
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
Args:
input_size: Number of input features
output_size: Number of output features
use_bias: Whether to include bias term (default: True)
TODO: Implement the Dense layer with weight initialization and forward pass.
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
"""
Initialize Dense layer with random weights.
TODO:
1. Store layer parameters (input_size, output_size, use_bias)
2. Initialize weights with small random values
3. Initialize bias to zeros (if use_bias=True)
"""
raise NotImplementedError("Student implementation required")
def forward(self, x: Tensor) -> Tensor:
"""
Forward pass: y = Wx + b
Args:
x: Input tensor of shape (batch_size, input_size)
Returns:
Output tensor of shape (batch_size, output_size)
TODO: Implement matrix multiplication and bias addition
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %%
#| hide
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
"""Initialize Dense layer with random weights."""
self.input_size = input_size
self.output_size = output_size
self.use_bias = use_bias
# Initialize weights with Xavier/Glorot initialization
# This helps with gradient flow during training
limit = math.sqrt(6.0 / (input_size + output_size))
self.weights = Tensor(
np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)
)
# Initialize bias to zeros
if use_bias:
self.bias = Tensor(np.zeros(output_size, dtype=np.float32))
else:
self.bias = None
def forward(self, x: Tensor) -> Tensor:
"""Forward pass: y = Wx + b"""
# Matrix multiplication: x @ weights
# x shape: (batch_size, input_size)
# weights shape: (input_size, output_size)
# result shape: (batch_size, output_size)
output = Tensor(x.data @ self.weights.data)
# Add bias if present
if self.bias is not None:
output = Tensor(output.data + self.bias.data)
return output
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Dense Layer
Once you implement the Dense layer above, run this cell to test it:
"""
# %%
# Test the Dense layer
try:
print("=== Testing Dense Layer ===")
# Create a simple Dense layer: 3 inputs → 2 outputs
layer = Dense(input_size=3, output_size=2)
print(f"Created Dense layer: {layer.input_size} → {layer.output_size}")
print(f"Weights shape: {layer.weights.shape}")
print(f"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}")
# Test with a single example
x = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3)
y = layer(x)
print(f"Input shape: {x.shape}")
print(f"Output shape: {y.shape}")
print(f"Input: {x.data}")
print(f"Output: {y.data}")
# Test with batch
x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # Shape: (2, 3)
y_batch = layer(x_batch)
print(f"\nBatch input shape: {x_batch.shape}")
print(f"Batch output shape: {y_batch.shape}")
print("✅ Dense layer working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the Dense layer above!")
# %% [markdown]
"""
## Step 2: Activation Functions - Adding Nonlinearity
Now we'll use the activation functions from the **activations** module!
**Clean Architecture**: We import the activation functions rather than redefining them:
```python
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
```
**Why this matters**:
- **Separation of concerns**: Math functions vs. layer building blocks
- **Reusability**: Activations can be used anywhere in the system
- **Maintainability**: One place to update activation implementations
- **Composability**: Clean imports make neural networks easier to build
**Why nonlinearity matters**: Without it, stacking layers is pointless!
```
Linear → Linear → Linear = Just one big Linear transformation
Linear → NonLinear → Linear = Can learn complex patterns
```
"""
# %% [markdown]
"""
### 🧪 Test Activation Functions from Activations Module
Let's test that we can use the activation functions from the activations module:
"""
# %%
# Test activation functions from activations module
try:
print("=== Testing Activation Functions from Activations Module ===")
# Test data: mix of positive, negative, and zero
x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])
print(f"Input: {x.data}")
# Test ReLU from activations module
relu = ReLU()
y_relu = relu(x)
print(f"ReLU output: {y_relu.data}")
# Test Sigmoid from activations module
sigmoid = Sigmoid()
y_sigmoid = sigmoid(x)
print(f"Sigmoid output: {y_sigmoid.data}")
# Test Tanh from activations module
tanh = Tanh()
y_tanh = tanh(x)
print(f"Tanh output: {y_tanh.data}")
print("✅ Activation functions from activations module working!")
print("🎉 Clean architecture: layers module uses activations module!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure the activations module is properly exported!")
# %% [markdown]
"""
## Step 3: Layer Composition - Building Neural Networks
Now comes the magic! We can **compose** layers to build neural networks:
```
Input → Dense → ReLU → Dense → Sigmoid → Output
```
This is a 2-layer neural network that can learn complex nonlinear patterns!
**Notice the clean architecture**:
- Dense layers handle linear transformations
- Activation functions (from activations module) handle nonlinearity
- Composition creates complex behaviors from simple building blocks
"""
# %%
# Build a simple 2-layer neural network
try:
print("=== Building a 2-Layer Neural Network ===")
# Network architecture: 3 → 4 → 2
# Input: 3 features
# Hidden: 4 neurons with ReLU
# Output: 2 neurons with Sigmoid
layer1 = Dense(input_size=3, output_size=4)
activation1 = ReLU() # From activations module
layer2 = Dense(input_size=4, output_size=2)
activation2 = Sigmoid() # From activations module
print("Network architecture:")
print(f" Input: 3 features")
print(f" Hidden: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)")
print(f" Output: {layer2.input_size} → {layer2.output_size} (Dense + Sigmoid)")
# Test with sample data
x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # 2 examples, 3 features each
print(f"\nInput shape: {x.shape}")
print(f"Input data: {x.data}")
# Forward pass through the network
h1 = layer1(x) # Dense layer 1
h1_activated = activation1(h1) # ReLU activation
h2 = layer2(h1_activated) # Dense layer 2
output = activation2(h2) # Sigmoid activation
print(f"\nAfter layer 1: {h1.shape}")
print(f"After ReLU: {h1_activated.shape}")
print(f"After layer 2: {h2.shape}")
print(f"Final output: {output.shape}")
print(f"Output values: {output.data}")
print("\n🎉 Neural network working! You just built your first neural network!")
print("🏗️ Clean architecture: Dense layers + Activations module = Neural Network")
print("Notice how the network transforms 3D input into 2D output through learned transformations.")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the layers and check activations module!")
# %% [markdown]
"""
## Step 4: Understanding What We Built
Congratulations! You just implemented a clean, modular neural network architecture:
### 🧱 **What You Built**
1. **Dense Layer**: Linear transformation `y = Wx + b`
2. **Activation Functions**: Imported from activations module (ReLU, Sigmoid, Tanh)
3. **Layer Composition**: Chaining layers to build networks
### 🏗️ **Clean Architecture Benefits**
- **Separation of concerns**: Math functions vs. layer building blocks
- **Reusability**: Activations can be used across different modules
- **Maintainability**: One place to update activation implementations
- **Composability**: Clean imports make complex networks easier to build
### 🎯 **Key Insights**
- **Layers are functions**: They transform tensors from one space to another
- **Composition creates complexity**: Simple layers → complex networks
- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations
- **Neural networks are function approximators**: They learn to map inputs to outputs
- **Modular design**: Building blocks can be combined in many ways
### 🚀 **What's Next**
In the next modules, you'll learn:
- **Training**: How networks learn from data (backpropagation, optimizers)
- **Architectures**: Specialized layers for different problems (CNNs, RNNs)
- **Applications**: Using networks for real problems
### 🔧 **Export to Package**
Run this to export your layers to the TinyTorch package:
```bash
python bin/tito.py sync
```
Then test your implementation:
```bash
python bin/tito.py test --module layers
```
**Great job! You've built a clean, modular foundation for neural networks!** 🎉
"""
# %%
# Final demonstration: A more complex example
try:
print("=== Final Demo: Image Classification Network ===")
# Simulate a small image: 28x28 pixels flattened to 784 features
# This is like a tiny MNIST digit
image_size = 28 * 28 # 784 pixels
num_classes = 10 # 10 digits (0-9)
# Build a 3-layer network for digit classification
# 784 → 128 → 64 → 10
layer1 = Dense(input_size=image_size, output_size=128)
relu1 = ReLU() # From activations module
layer2 = Dense(input_size=128, output_size=64)
relu2 = ReLU() # From activations module
layer3 = Dense(input_size=64, output_size=num_classes)
softmax = Sigmoid() # Using Sigmoid as a simple "probability-like" output
print(f"Image classification network:")
print(f" Input: {image_size} pixels (28x28 image)")
print(f" Hidden 1: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)")
print(f" Hidden 2: {layer2.input_size} → {layer2.output_size} (Dense + ReLU)")
print(f" Output: {layer3.input_size} → {layer3.output_size} (Dense + Sigmoid)")
# Simulate a batch of 5 images
batch_size = 5
fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))
# Forward pass
h1 = relu1(layer1(fake_images))
h2 = relu2(layer2(h1))
predictions = softmax(layer3(h2))
print(f"\nBatch processing:")
print(f" Input batch shape: {fake_images.shape}")
print(f" Predictions shape: {predictions.shape}")
print(f" Sample predictions: {predictions.data[0]}") # First image predictions
print("\n🎉 You built a neural network that could classify images!")
print("🏗️ Clean architecture: Dense layers + Activations module = Image Classifier")
print("With training, this network could learn to recognize handwritten digits!")
except Exception as e:
print(f"❌ Error: {e}")
print("Check your layer implementations and activations module!")
# %% [markdown]
"""
## 🎓 Module Summary
### What You Learned
1. **Layer Architecture**: Dense layers as linear transformations
2. **Clean Dependencies**: Layers module uses activations module
3. **Function Composition**: Simple building blocks → complex networks
4. **Modular Design**: Separation of concerns for maintainable code
### Key Architectural Insight
```
activations (math functions) → layers (building blocks) → networks (applications)
```
This clean dependency graph makes the system:
- **Understandable**: Each module has a clear purpose
- **Testable**: Each module can be tested independently
- **Reusable**: Components can be used across different contexts
- **Maintainable**: Changes are localized to appropriate modules
### Next Steps
- **Training**: Learn how networks learn from data
- **Advanced Architectures**: CNNs, RNNs, Transformers
- **Applications**: Real-world machine learning problems
**Congratulations on building a clean, modular neural network foundation!** 🚀
"""