mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 08:53:32 -05:00
- Restored tools/py_to_notebook.py as a focused, standalone tool - Updated tito notebooks command to use subprocess to call the separate tool - Maintains clean separation of concerns: tito.py for CLI orchestration, py_to_notebook.py for conversion logic - Updated documentation to use 'tito notebooks' command instead of direct tool calls - Benefits: easier debugging, better maintainability, focused single-responsibility modules
469 lines
16 KiB
Plaintext
469 lines
16 KiB
Plaintext
# ---
|
|
# jupyter:
|
|
# jupytext:
|
|
# text_representation:
|
|
# extension: .py
|
|
# format_name: percent
|
|
# format_version: '1.3'
|
|
# jupytext_version: 1.17.1
|
|
# ---
|
|
|
|
# %% [markdown]
|
|
"""
|
|
# Module 2: Layers - Neural Network Building Blocks
|
|
|
|
Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.
|
|
|
|
## Learning Goals
|
|
- Understand layers as functions that transform tensors: `y = f(x)`
|
|
- Implement Dense layers with linear transformations: `y = Wx + b`
|
|
- Use activation functions from the activations module for nonlinearity
|
|
- See how neural networks are just function composition
|
|
- Build intuition before diving into training
|
|
|
|
## Build → Use → Understand
|
|
1. **Build**: Dense layers using activation functions as building blocks
|
|
2. **Use**: Transform tensors and see immediate results
|
|
3. **Understand**: How neural networks transform information
|
|
|
|
## Module Dependencies
|
|
This module builds on the **activations** module:
|
|
- **activations** → **layers** → **networks**
|
|
- Clean separation of concerns: math functions → layer building blocks → full networks
|
|
|
|
## Module → Package Structure
|
|
**🎓 Teaching vs. 🔧 Building**:
|
|
- **Learning side**: Work in `modules/layers/layers_dev.py`
|
|
- **Building side**: Exports to `tinytorch/core/layers.py`
|
|
|
|
This module builds the fundamental transformations that compose into neural networks.
|
|
"""
|
|
|
|
# %%
|
|
#| default_exp core.layers
|
|
|
|
# Setup and imports
|
|
import numpy as np
|
|
import sys
|
|
from typing import Union, Optional, Callable
|
|
import math
|
|
|
|
# %%
|
|
#| export
|
|
import numpy as np
|
|
import math
|
|
import sys
|
|
from typing import Union, Optional, Callable
|
|
from tinytorch.core.tensor import Tensor
|
|
|
|
# Import activation functions from the activations module
|
|
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
|
|
|
|
# Import our Tensor class
|
|
# sys.path.append('../../')
|
|
# from modules.tensor.tensor_dev import Tensor
|
|
|
|
# print("🔥 TinyTorch Layers Module")
|
|
# print(f"NumPy version: {np.__version__}")
|
|
# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
|
|
# print("Ready to build neural network layers!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 1: What is a Layer?
|
|
|
|
A **layer** is a function that transforms tensors. Think of it as:
|
|
- **Input**: Tensor with some shape
|
|
- **Transformation**: Mathematical operation (linear, nonlinear, etc.)
|
|
- **Output**: Tensor with possibly different shape
|
|
|
|
**The fundamental insight**: Neural networks are just function composition!
|
|
```
|
|
x → Layer1 → Layer2 → Layer3 → y
|
|
```
|
|
|
|
**Why layers matter**:
|
|
- They're the building blocks of all neural networks
|
|
- Each layer learns a different transformation
|
|
- Composing layers creates complex functions
|
|
- Understanding layers = understanding neural networks
|
|
|
|
Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).
|
|
"""
|
|
|
|
# %%
|
|
#| export
|
|
class Dense:
|
|
"""
|
|
Dense (Linear) Layer: y = Wx + b
|
|
|
|
The fundamental building block of neural networks.
|
|
Performs linear transformation: matrix multiplication + bias addition.
|
|
|
|
Args:
|
|
input_size: Number of input features
|
|
output_size: Number of output features
|
|
use_bias: Whether to include bias term (default: True)
|
|
|
|
TODO: Implement the Dense layer with weight initialization and forward pass.
|
|
"""
|
|
|
|
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
|
|
"""
|
|
Initialize Dense layer with random weights.
|
|
|
|
TODO:
|
|
1. Store layer parameters (input_size, output_size, use_bias)
|
|
2. Initialize weights with small random values
|
|
3. Initialize bias to zeros (if use_bias=True)
|
|
"""
|
|
raise NotImplementedError("Student implementation required")
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""
|
|
Forward pass: y = Wx + b
|
|
|
|
Args:
|
|
x: Input tensor of shape (batch_size, input_size)
|
|
|
|
Returns:
|
|
Output tensor of shape (batch_size, output_size)
|
|
|
|
TODO: Implement matrix multiplication and bias addition
|
|
"""
|
|
raise NotImplementedError("Student implementation required")
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
"""Make layer callable: layer(x) same as layer.forward(x)"""
|
|
return self.forward(x)
|
|
|
|
# %%
|
|
#| hide
|
|
#| export
|
|
class Dense:
|
|
"""
|
|
Dense (Linear) Layer: y = Wx + b
|
|
|
|
The fundamental building block of neural networks.
|
|
Performs linear transformation: matrix multiplication + bias addition.
|
|
"""
|
|
|
|
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
|
|
"""Initialize Dense layer with random weights."""
|
|
self.input_size = input_size
|
|
self.output_size = output_size
|
|
self.use_bias = use_bias
|
|
|
|
# Initialize weights with Xavier/Glorot initialization
|
|
# This helps with gradient flow during training
|
|
limit = math.sqrt(6.0 / (input_size + output_size))
|
|
self.weights = Tensor(
|
|
np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)
|
|
)
|
|
|
|
# Initialize bias to zeros
|
|
if use_bias:
|
|
self.bias = Tensor(np.zeros(output_size, dtype=np.float32))
|
|
else:
|
|
self.bias = None
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""Forward pass: y = Wx + b"""
|
|
# Matrix multiplication: x @ weights
|
|
# x shape: (batch_size, input_size)
|
|
# weights shape: (input_size, output_size)
|
|
# result shape: (batch_size, output_size)
|
|
output = Tensor(x.data @ self.weights.data)
|
|
|
|
# Add bias if present
|
|
if self.bias is not None:
|
|
output = Tensor(output.data + self.bias.data)
|
|
|
|
return output
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
"""Make layer callable: layer(x) same as layer.forward(x)"""
|
|
return self.forward(x)
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 🧪 Test Your Dense Layer
|
|
|
|
Once you implement the Dense layer above, run this cell to test it:
|
|
"""
|
|
|
|
# %%
|
|
# Test the Dense layer
|
|
try:
|
|
print("=== Testing Dense Layer ===")
|
|
|
|
# Create a simple Dense layer: 3 inputs → 2 outputs
|
|
layer = Dense(input_size=3, output_size=2)
|
|
print(f"Created Dense layer: {layer.input_size} → {layer.output_size}")
|
|
print(f"Weights shape: {layer.weights.shape}")
|
|
print(f"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}")
|
|
|
|
# Test with a single example
|
|
x = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3)
|
|
y = layer(x)
|
|
print(f"Input shape: {x.shape}")
|
|
print(f"Output shape: {y.shape}")
|
|
print(f"Input: {x.data}")
|
|
print(f"Output: {y.data}")
|
|
|
|
# Test with batch
|
|
x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # Shape: (2, 3)
|
|
y_batch = layer(x_batch)
|
|
print(f"\nBatch input shape: {x_batch.shape}")
|
|
print(f"Batch output shape: {y_batch.shape}")
|
|
|
|
print("✅ Dense layer working!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure to implement the Dense layer above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 2: Activation Functions - Adding Nonlinearity
|
|
|
|
Now we'll use the activation functions from the **activations** module!
|
|
|
|
**Clean Architecture**: We import the activation functions rather than redefining them:
|
|
```python
|
|
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
|
|
```
|
|
|
|
**Why this matters**:
|
|
- **Separation of concerns**: Math functions vs. layer building blocks
|
|
- **Reusability**: Activations can be used anywhere in the system
|
|
- **Maintainability**: One place to update activation implementations
|
|
- **Composability**: Clean imports make neural networks easier to build
|
|
|
|
**Why nonlinearity matters**: Without it, stacking layers is pointless!
|
|
```
|
|
Linear → Linear → Linear = Just one big Linear transformation
|
|
Linear → NonLinear → Linear = Can learn complex patterns
|
|
```
|
|
"""
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 🧪 Test Activation Functions from Activations Module
|
|
|
|
Let's test that we can use the activation functions from the activations module:
|
|
"""
|
|
|
|
# %%
|
|
# Test activation functions from activations module
|
|
try:
|
|
print("=== Testing Activation Functions from Activations Module ===")
|
|
|
|
# Test data: mix of positive, negative, and zero
|
|
x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])
|
|
print(f"Input: {x.data}")
|
|
|
|
# Test ReLU from activations module
|
|
relu = ReLU()
|
|
y_relu = relu(x)
|
|
print(f"ReLU output: {y_relu.data}")
|
|
|
|
# Test Sigmoid from activations module
|
|
sigmoid = Sigmoid()
|
|
y_sigmoid = sigmoid(x)
|
|
print(f"Sigmoid output: {y_sigmoid.data}")
|
|
|
|
# Test Tanh from activations module
|
|
tanh = Tanh()
|
|
y_tanh = tanh(x)
|
|
print(f"Tanh output: {y_tanh.data}")
|
|
|
|
print("✅ Activation functions from activations module working!")
|
|
print("🎉 Clean architecture: layers module uses activations module!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure the activations module is properly exported!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 3: Layer Composition - Building Neural Networks
|
|
|
|
Now comes the magic! We can **compose** layers to build neural networks:
|
|
|
|
```
|
|
Input → Dense → ReLU → Dense → Sigmoid → Output
|
|
```
|
|
|
|
This is a 2-layer neural network that can learn complex nonlinear patterns!
|
|
|
|
**Notice the clean architecture**:
|
|
- Dense layers handle linear transformations
|
|
- Activation functions (from activations module) handle nonlinearity
|
|
- Composition creates complex behaviors from simple building blocks
|
|
"""
|
|
|
|
# %%
|
|
# Build a simple 2-layer neural network
|
|
try:
|
|
print("=== Building a 2-Layer Neural Network ===")
|
|
|
|
# Network architecture: 3 → 4 → 2
|
|
# Input: 3 features
|
|
# Hidden: 4 neurons with ReLU
|
|
# Output: 2 neurons with Sigmoid
|
|
|
|
layer1 = Dense(input_size=3, output_size=4)
|
|
activation1 = ReLU() # From activations module
|
|
layer2 = Dense(input_size=4, output_size=2)
|
|
activation2 = Sigmoid() # From activations module
|
|
|
|
print("Network architecture:")
|
|
print(f" Input: 3 features")
|
|
print(f" Hidden: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)")
|
|
print(f" Output: {layer2.input_size} → {layer2.output_size} (Dense + Sigmoid)")
|
|
|
|
# Test with sample data
|
|
x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # 2 examples, 3 features each
|
|
print(f"\nInput shape: {x.shape}")
|
|
print(f"Input data: {x.data}")
|
|
|
|
# Forward pass through the network
|
|
h1 = layer1(x) # Dense layer 1
|
|
h1_activated = activation1(h1) # ReLU activation
|
|
h2 = layer2(h1_activated) # Dense layer 2
|
|
output = activation2(h2) # Sigmoid activation
|
|
|
|
print(f"\nAfter layer 1: {h1.shape}")
|
|
print(f"After ReLU: {h1_activated.shape}")
|
|
print(f"After layer 2: {h2.shape}")
|
|
print(f"Final output: {output.shape}")
|
|
print(f"Output values: {output.data}")
|
|
|
|
print("\n🎉 Neural network working! You just built your first neural network!")
|
|
print("🏗️ Clean architecture: Dense layers + Activations module = Neural Network")
|
|
print("Notice how the network transforms 3D input into 2D output through learned transformations.")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure to implement the layers and check activations module!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 4: Understanding What We Built
|
|
|
|
Congratulations! You just implemented a clean, modular neural network architecture:
|
|
|
|
### 🧱 **What You Built**
|
|
1. **Dense Layer**: Linear transformation `y = Wx + b`
|
|
2. **Activation Functions**: Imported from activations module (ReLU, Sigmoid, Tanh)
|
|
3. **Layer Composition**: Chaining layers to build networks
|
|
|
|
### 🏗️ **Clean Architecture Benefits**
|
|
- **Separation of concerns**: Math functions vs. layer building blocks
|
|
- **Reusability**: Activations can be used across different modules
|
|
- **Maintainability**: One place to update activation implementations
|
|
- **Composability**: Clean imports make complex networks easier to build
|
|
|
|
### 🎯 **Key Insights**
|
|
- **Layers are functions**: They transform tensors from one space to another
|
|
- **Composition creates complexity**: Simple layers → complex networks
|
|
- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations
|
|
- **Neural networks are function approximators**: They learn to map inputs to outputs
|
|
- **Modular design**: Building blocks can be combined in many ways
|
|
|
|
### 🚀 **What's Next**
|
|
In the next modules, you'll learn:
|
|
- **Training**: How networks learn from data (backpropagation, optimizers)
|
|
- **Architectures**: Specialized layers for different problems (CNNs, RNNs)
|
|
- **Applications**: Using networks for real problems
|
|
|
|
### 🔧 **Export to Package**
|
|
Run this to export your layers to the TinyTorch package:
|
|
```bash
|
|
python bin/tito.py sync
|
|
```
|
|
|
|
Then test your implementation:
|
|
```bash
|
|
python bin/tito.py test --module layers
|
|
```
|
|
|
|
**Great job! You've built a clean, modular foundation for neural networks!** 🎉
|
|
"""
|
|
|
|
# %%
|
|
# Final demonstration: A more complex example
|
|
try:
|
|
print("=== Final Demo: Image Classification Network ===")
|
|
|
|
# Simulate a small image: 28x28 pixels flattened to 784 features
|
|
# This is like a tiny MNIST digit
|
|
image_size = 28 * 28 # 784 pixels
|
|
num_classes = 10 # 10 digits (0-9)
|
|
|
|
# Build a 3-layer network for digit classification
|
|
# 784 → 128 → 64 → 10
|
|
layer1 = Dense(input_size=image_size, output_size=128)
|
|
relu1 = ReLU() # From activations module
|
|
layer2 = Dense(input_size=128, output_size=64)
|
|
relu2 = ReLU() # From activations module
|
|
layer3 = Dense(input_size=64, output_size=num_classes)
|
|
softmax = Sigmoid() # Using Sigmoid as a simple "probability-like" output
|
|
|
|
print(f"Image classification network:")
|
|
print(f" Input: {image_size} pixels (28x28 image)")
|
|
print(f" Hidden 1: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)")
|
|
print(f" Hidden 2: {layer2.input_size} → {layer2.output_size} (Dense + ReLU)")
|
|
print(f" Output: {layer3.input_size} → {layer3.output_size} (Dense + Sigmoid)")
|
|
|
|
# Simulate a batch of 5 images
|
|
batch_size = 5
|
|
fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))
|
|
|
|
# Forward pass
|
|
h1 = relu1(layer1(fake_images))
|
|
h2 = relu2(layer2(h1))
|
|
predictions = softmax(layer3(h2))
|
|
|
|
print(f"\nBatch processing:")
|
|
print(f" Input batch shape: {fake_images.shape}")
|
|
print(f" Predictions shape: {predictions.shape}")
|
|
print(f" Sample predictions: {predictions.data[0]}") # First image predictions
|
|
|
|
print("\n🎉 You built a neural network that could classify images!")
|
|
print("🏗️ Clean architecture: Dense layers + Activations module = Image Classifier")
|
|
print("With training, this network could learn to recognize handwritten digits!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Check your layer implementations and activations module!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## 🎓 Module Summary
|
|
|
|
### What You Learned
|
|
1. **Layer Architecture**: Dense layers as linear transformations
|
|
2. **Clean Dependencies**: Layers module uses activations module
|
|
3. **Function Composition**: Simple building blocks → complex networks
|
|
4. **Modular Design**: Separation of concerns for maintainable code
|
|
|
|
### Key Architectural Insight
|
|
```
|
|
activations (math functions) → layers (building blocks) → networks (applications)
|
|
```
|
|
|
|
This clean dependency graph makes the system:
|
|
- **Understandable**: Each module has a clear purpose
|
|
- **Testable**: Each module can be tested independently
|
|
- **Reusable**: Components can be used across different contexts
|
|
- **Maintainable**: Changes are localized to appropriate modules
|
|
|
|
### Next Steps
|
|
- **Training**: Learn how networks learn from data
|
|
- **Advanced Architectures**: CNNs, RNNs, Transformers
|
|
- **Applications**: Real-world machine learning problems
|
|
|
|
**Congratulations on building a clean, modular neural network foundation!** 🚀
|
|
""" |