mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-28 13:43:03 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
1335 lines
53 KiB
Python
1335 lines
53 KiB
Python
# ---
|
||
# jupyter:
|
||
# jupytext:
|
||
# text_representation:
|
||
# extension: .py
|
||
# format_name: percent
|
||
# format_version: '1.3'
|
||
# jupytext_version: 1.17.1
|
||
# ---
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Layers - Neural Network Building Blocks and Composition Patterns
|
||
|
||
Welcome to the unified Layers module! You'll build all the fundamental components for neural networks: base classes, linear transformations, network composition, and tensor reshaping operations.
|
||
|
||
## Learning Goals
|
||
- Systems understanding: How layer composition creates complex function approximators from simple building blocks
|
||
- Core implementation skill: Build Module base class, Linear layers, Sequential networks, and Flatten operations
|
||
- Pattern recognition: Understand how different layer types solve different computational problems and compose together
|
||
- Framework connection: See how your implementations mirror PyTorch's nn.Module, nn.Linear, nn.Sequential, and nn.Flatten patterns
|
||
- Performance insight: Learn why layer composition, memory layout, and tensor operations determine training speed
|
||
|
||
## Build → Use → Reflect
|
||
1. **Build**: Module system, matrix operations, Dense layers, Sequential networks, and tensor flattening
|
||
2. **Use**: Compose all components into complete neural networks and observe data flow patterns
|
||
3. **Reflect**: Why does proper abstraction enable complex architectures while maintaining clean interfaces?
|
||
|
||
## What You'll Achieve
|
||
By the end of this module, you'll understand:
|
||
- Deep technical understanding of neural network component architecture and composition patterns
|
||
- Practical capability to build complete neural network systems from fundamental building blocks
|
||
- Systems insight into why modular design is essential for scalable ML systems
|
||
- Performance consideration of how tensor operations and memory layout affect computational efficiency
|
||
- Connection to production ML systems and how major frameworks organize neural network components
|
||
|
||
## Systems Reality Check
|
||
💡 **Production Context**: PyTorch's nn.Module system enables all modern neural networks through clean composition patterns
|
||
⚡ **Performance Note**: Tensor reshape operations and layer composition can create memory bottlenecks - understanding this is key to efficient neural network design
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "layers-imports", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| default_exp core.layers
|
||
|
||
#| export
|
||
import numpy as np
|
||
import sys
|
||
import os
|
||
|
||
# Import our building blocks - try package first, then local modules
|
||
try:
|
||
from tinytorch.core.tensor import Tensor, Parameter
|
||
except ImportError:
|
||
# For development, import from local modules
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_tensor'))
|
||
from tensor_dev import Tensor, Parameter
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "layers-setup", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
print("🔥 TinyTorch Layers Module")
|
||
print(f"NumPy version: {np.__version__}")
|
||
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
|
||
print("Ready to build neural network layers!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Module Base Class - Neural Network Foundation
|
||
|
||
Before building specific layers like Dense and Conv2d, we need a base class that handles parameter management and provides a clean interface. This is the foundation that makes neural networks composable and easy to use.
|
||
|
||
### Why We Need a Module Base Class
|
||
|
||
🏗️ **Organization**: Automatic parameter collection across all layers
|
||
🔄 **Composition**: Modules can contain other modules (networks of networks)
|
||
🎯 **Clean API**: Enable `model(input)` instead of `model.forward(input)`
|
||
📦 **PyTorch Compatibility**: Same patterns as `torch.nn.Module`
|
||
|
||
Let's build the foundation that will make all our neural network code clean and powerful:
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "module-base-class", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| export
|
||
class Module:
|
||
"""
|
||
Base class for all neural network modules.
|
||
|
||
Provides automatic parameter collection, forward pass management,
|
||
and clean composition patterns. All layers (Dense, Conv2d, etc.)
|
||
inherit from this class.
|
||
|
||
Key Features:
|
||
- Automatic parameter registration when you assign parameter Tensors (weights, bias)
|
||
- Recursive parameter collection from sub-modules
|
||
- Clean __call__ interface: model(x) instead of model.forward(x)
|
||
- Extensible for custom layers
|
||
|
||
Example Usage:
|
||
class MLP(Module):
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.layer1 = Dense(784, 128) # Auto-registered!
|
||
self.layer2 = Dense(128, 10) # Auto-registered!
|
||
|
||
def forward(self, x):
|
||
x = self.layer1(x)
|
||
return self.layer2(x)
|
||
|
||
model = MLP()
|
||
params = model.parameters() # Gets all parameters automatically!
|
||
output = model(input) # Clean interface!
|
||
"""
|
||
|
||
def __init__(self):
|
||
"""Initialize module with empty parameter and sub-module storage."""
|
||
self._parameters = []
|
||
self._modules = []
|
||
|
||
def __setattr__(self, name, value):
|
||
"""
|
||
Intercept attribute assignment to auto-register parameters and modules.
|
||
|
||
When you do self.weight = Parameter(...), this automatically adds
|
||
the parameter to our collection for easy optimization.
|
||
"""
|
||
# Check if it's a Tensor that looks like a parameter (has .data attribute)
|
||
# Parameters are typically named 'weights', 'bias', 'weight', etc.
|
||
if (hasattr(value, 'data') and hasattr(value, 'shape') and
|
||
isinstance(value, Tensor) and
|
||
name in ['weights', 'weight', 'bias']):
|
||
self._parameters.append(value)
|
||
# Check if it's another Module (sub-module)
|
||
elif isinstance(value, Module):
|
||
self._modules.append(value)
|
||
|
||
# Always call parent to actually set the attribute
|
||
super().__setattr__(name, value)
|
||
|
||
def parameters(self):
|
||
"""
|
||
Recursively collect all parameters from this module and sub-modules.
|
||
|
||
Returns:
|
||
List of all parameters (Tensors containing weights and biases)
|
||
|
||
This enables: optimizer = Adam(model.parameters()) (when optimizers are available)
|
||
"""
|
||
# Start with our own parameters
|
||
params = list(self._parameters)
|
||
|
||
# Add parameters from sub-modules recursively
|
||
for module in self._modules:
|
||
params.extend(module.parameters())
|
||
|
||
return params
|
||
|
||
def __call__(self, *args, **kwargs):
|
||
"""
|
||
Makes modules callable: model(x) instead of model.forward(x).
|
||
|
||
This is the magic that enables clean syntax like:
|
||
output = model(input)
|
||
instead of:
|
||
output = model.forward(input)
|
||
"""
|
||
return self.forward(*args, **kwargs)
|
||
|
||
def forward(self, *args, **kwargs):
|
||
"""
|
||
Forward pass - must be implemented by subclasses.
|
||
|
||
This is where the actual computation happens. Every layer
|
||
defines its own forward() method.
|
||
"""
|
||
raise NotImplementedError("Subclasses must implement forward()")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Where This Code Lives in the Final Package
|
||
|
||
**Learning Side:** You work in modules/04_layers/layers_dev.py
|
||
**Building Side:** Code exports to tinytorch.core.layers
|
||
|
||
```python
|
||
# Final package structure:
|
||
from tinytorch.core.layers import Module, Linear, Dense, Sequential, Flatten, matmul # Complete layer system!
|
||
from tinytorch.core.tensor import Tensor, Parameter # The foundation
|
||
from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity
|
||
```
|
||
|
||
**Why this matters:**
|
||
- **Learning:** Complete layer system in one focused module for deep understanding
|
||
- **Production:** Proper organization like PyTorch's torch.nn with all core components together
|
||
- **Consistency:** All layer types, network composition, and tensor operations in core.layers
|
||
- **Integration:** Works seamlessly with tensors and activations for complete neural networks
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Matrix Multiplication - The Heart of Neural Networks
|
||
|
||
Every neural network operation ultimately reduces to matrix multiplication. Let's build the foundation that powers everything from simple perceptrons to transformers.
|
||
|
||
## Why Matrix Multiplication Matters
|
||
|
||
🧠 **Neural Network Core**: Every layer applies: output = input @ weights + bias
|
||
⚡ **Parallel Processing**: Matrix ops utilize vectorized CPU instructions and GPU parallelism
|
||
🏗️ **Scalable Architecture**: Stacking matrix operations creates arbitrarily complex function approximators
|
||
📈 **Performance Critical**: 90%+ of neural network compute time is spent in matrix multiplication
|
||
|
||
## Learning Objectives
|
||
By implementing matrix multiplication, you'll understand:
|
||
- How neural networks transform data through linear algebra
|
||
- Why matrix operations are the building blocks of all modern ML frameworks
|
||
- How proper implementation affects performance by orders of magnitude
|
||
- The connection between mathematical operations and computational efficiency
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "matmul-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
def matmul(a: Tensor, b: Tensor) -> Tensor:
|
||
"""
|
||
Matrix multiplication for tensors using explicit loops.
|
||
|
||
This implementation uses triple-nested loops for educational understanding
|
||
of the fundamental operations. Module 15 will show the optimization progression
|
||
from loops → blocking → vectorized operations.
|
||
|
||
Args:
|
||
a: Left tensor (shape: ..., m, k)
|
||
b: Right tensor (shape: ..., k, n)
|
||
|
||
Returns:
|
||
Result tensor (shape: ..., m, n)
|
||
|
||
TODO: Implement matrix multiplication using explicit loops.
|
||
|
||
STEP-BY-STEP IMPLEMENTATION:
|
||
1. Extract numpy arrays from both tensors using .data
|
||
2. Check tensor shapes for compatibility
|
||
3. Use triple-nested loops to show every operation
|
||
4. Wrap result in a new Tensor and return
|
||
|
||
LEARNING CONNECTIONS:
|
||
- This is the core operation in Dense layers: output = input @ weights
|
||
- Shows the fundamental computation before optimization
|
||
- Module 15 will demonstrate the progression to high-performance implementations
|
||
- Understanding loops helps appreciate vectorization and GPU parallelization
|
||
|
||
EDUCATIONAL APPROACH:
|
||
- Intentionally simple for understanding, not performance
|
||
- Makes every multiply-add operation explicit
|
||
- Sets up Module 15 to show optimization techniques
|
||
|
||
EXAMPLE:
|
||
```python
|
||
a = Tensor([[1, 2], [3, 4]]) # shape (2, 2)
|
||
b = Tensor([[5, 6], [7, 8]]) # shape (2, 2)
|
||
result = matmul(a, b)
|
||
# result.data = [[19, 22], [43, 50]]
|
||
```
|
||
|
||
IMPLEMENTATION HINTS:
|
||
- Use explicit loops to show every operation
|
||
- This is educational, not optimized for performance
|
||
- Module 15 will show the progression to fast implementations
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Extract numpy arrays from tensors
|
||
a_data = a.data
|
||
b_data = b.data
|
||
|
||
# Get dimensions and validate compatibility
|
||
if len(a_data.shape) != 2 or len(b_data.shape) != 2:
|
||
raise ValueError("matmul requires 2D tensors")
|
||
|
||
m, k = a_data.shape
|
||
k2, n = b_data.shape
|
||
|
||
if k != k2:
|
||
raise ValueError(f"Inner dimensions must match: {k} != {k2}")
|
||
|
||
# Initialize result matrix
|
||
result = np.zeros((m, n), dtype=a_data.dtype)
|
||
|
||
# Triple nested loops - educational, shows every operation
|
||
# This is intentionally simple to understand the fundamental computation
|
||
# Module 15 will show the optimization journey:
|
||
# Step 1 (here): Educational loops - slow but clear
|
||
# Step 2: Loop blocking for cache efficiency
|
||
# Step 3: Vectorized operations with NumPy
|
||
# Step 4: GPU acceleration and BLAS libraries
|
||
for i in range(m): # For each row in result
|
||
for j in range(n): # For each column in result
|
||
for k_idx in range(k): # Dot product: sum over inner dimension
|
||
result[i, j] += a_data[i, k_idx] * b_data[k_idx, j]
|
||
|
||
# Return new Tensor with result
|
||
return Tensor(result)
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Testing Matrix Multiplication
|
||
|
||
Let's verify our matrix multiplication works correctly with some test cases.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-matmul", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false}
|
||
def test_matmul():
|
||
"""Test matrix multiplication implementation."""
|
||
print("🧪 Testing Matrix Multiplication...")
|
||
|
||
# Test case 1: Simple 2x2 matrices
|
||
a = Tensor([[1, 2], [3, 4]])
|
||
b = Tensor([[5, 6], [7, 8]])
|
||
result = matmul(a, b)
|
||
expected = np.array([[19, 22], [43, 50]])
|
||
|
||
assert np.allclose(result.data, expected), f"Expected {expected}, got {result.data}"
|
||
print("✅ 2x2 matrix multiplication")
|
||
|
||
# Test case 2: Non-square matrices
|
||
a = Tensor([[1, 2, 3], [4, 5, 6]]) # 2x3
|
||
b = Tensor([[7, 8], [9, 10], [11, 12]]) # 3x2
|
||
result = matmul(a, b)
|
||
expected = np.array([[58, 64], [139, 154]])
|
||
|
||
assert np.allclose(result.data, expected), f"Expected {expected}, got {result.data}"
|
||
print("✅ Non-square matrix multiplication")
|
||
|
||
# Test case 3: Vector-matrix multiplication
|
||
a = Tensor([[1, 2, 3]]) # 1x3 (row vector)
|
||
b = Tensor([[4], [5], [6]]) # 3x1 (column vector)
|
||
result = matmul(a, b)
|
||
expected = np.array([[32]]) # 1*4 + 2*5 + 3*6 = 32
|
||
|
||
assert np.allclose(result.data, expected), f"Expected {expected}, got {result.data}"
|
||
print("✅ Vector-matrix multiplication")
|
||
|
||
print("🎉 All matrix multiplication tests passed!")
|
||
|
||
test_matmul()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Dense Layer - The Fundamental Neural Network Component
|
||
|
||
Dense layers (also called Linear or Fully Connected layers) are the building blocks of neural networks. They apply the transformation: **output = input @ weights + bias**
|
||
|
||
## Why Dense Layers Matter
|
||
|
||
🧠 **Universal Function Approximators**: Dense layers can approximate any continuous function when stacked
|
||
🔧 **Parameter Learning**: Weights and biases are learned through backpropagation
|
||
🏗️ **Modular Design**: Dense layers compose into complex architectures (MLPs, transformers, etc.)
|
||
⚡ **Computational Efficiency**: Matrix operations leverage optimized linear algebra libraries
|
||
|
||
## Learning Objectives
|
||
By implementing Dense layers, you'll understand:
|
||
- How neural networks learn through adjustable parameters
|
||
- The mathematical foundation underlying all neural network layers
|
||
- Why proper parameter initialization is crucial for training success
|
||
- How layer composition enables complex function approximation
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "dense-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Linear(Module):
|
||
"""
|
||
Linear (Fully Connected) Layer implementation.
|
||
|
||
Applies the transformation: output = input @ weights + bias
|
||
|
||
Inherits from Module for automatic parameter management and clean API.
|
||
This is PyTorch's nn.Linear equivalent with the same name for familiarity.
|
||
|
||
Features:
|
||
- Automatic parameter registration (weights and bias)
|
||
- Clean call interface: layer(input) instead of layer.forward(input)
|
||
- Works with optimizers via model.parameters()
|
||
"""
|
||
|
||
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
|
||
"""
|
||
Initialize Linear layer with random weights and optional bias.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output features
|
||
use_bias: Whether to include bias term
|
||
|
||
TODO: Implement Linear layer initialization.
|
||
|
||
STEP-BY-STEP IMPLEMENTATION:
|
||
1. Store input_size and output_size as instance variables
|
||
2. Initialize weights as Tensor with shape (input_size, output_size)
|
||
3. Use small random values: np.random.randn(...) * 0.1
|
||
4. Initialize bias as Tensor with shape (output_size,) if use_bias is True
|
||
5. Set bias to None if use_bias is False
|
||
|
||
LEARNING CONNECTIONS:
|
||
- Small random initialization prevents symmetry breaking
|
||
- Weight shape (input_size, output_size) enables matrix multiplication
|
||
- Bias allows shifting the output (like y-intercept in linear regression)
|
||
- PyTorch uses more sophisticated initialization (Xavier, Kaiming)
|
||
|
||
IMPLEMENTATION HINTS:
|
||
- Use np.random.randn() for Gaussian random numbers
|
||
- Scale by 0.1 to keep initial values small
|
||
- Remember to wrap numpy arrays in Tensor()
|
||
- Store use_bias flag for forward pass logic
|
||
"""
|
||
### BEGIN SOLUTION
|
||
super().__init__() # Initialize Module base class
|
||
|
||
self.input_size = input_size
|
||
self.output_size = output_size
|
||
self.use_bias = use_bias
|
||
|
||
# Initialize weights with small random values using Parameter
|
||
# Shape: (input_size, output_size) for matrix multiplication
|
||
weight_data = np.random.randn(input_size, output_size) * 0.1
|
||
self.weights = Parameter(weight_data) # Auto-registers for optimization!
|
||
|
||
# Initialize bias if requested
|
||
if use_bias:
|
||
bias_data = np.random.randn(output_size) * 0.1
|
||
self.bias = Parameter(bias_data) # Auto-registers for optimization!
|
||
else:
|
||
self.bias = None
|
||
### END SOLUTION
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Forward pass through the Linear layer.
|
||
|
||
Args:
|
||
x: Input tensor (shape: ..., input_size)
|
||
|
||
Returns:
|
||
Output tensor (shape: ..., output_size)
|
||
|
||
TODO: Implement forward pass: output = input @ weights + bias
|
||
|
||
STEP-BY-STEP IMPLEMENTATION:
|
||
1. Perform matrix multiplication: output = matmul(x, self.weights)
|
||
2. If bias exists, add it: output = output + self.bias
|
||
3. Return result as Tensor
|
||
|
||
LEARNING CONNECTIONS:
|
||
- This is the core neural network transformation
|
||
- Matrix multiplication scales input features to output features
|
||
- Bias provides offset (like y-intercept in linear equations)
|
||
- Broadcasting handles different batch sizes automatically
|
||
|
||
IMPLEMENTATION HINTS:
|
||
- Use the matmul function you implemented above
|
||
- Handle bias addition with simple + operator
|
||
- Check if self.bias is not None before adding
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Matrix multiplication: input @ weights
|
||
output = matmul(x, self.weights)
|
||
|
||
# Add bias if it exists
|
||
if self.bias is not None:
|
||
output = output + self.bias
|
||
|
||
return output
|
||
### END SOLUTION
|
||
|
||
# Backward compatibility alias
|
||
#| export
|
||
Dense = Linear
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Testing Linear Layer
|
||
|
||
Let's verify our Linear layer works correctly with comprehensive tests.
|
||
The tests use Dense for backward compatibility, but Dense is now an alias for Linear.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-dense", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false}
|
||
def test_dense_layer():
|
||
"""Test Dense layer implementation."""
|
||
print("🧪 Testing Dense Layer...")
|
||
|
||
# Test case 1: Basic functionality
|
||
layer = Dense(input_size=3, output_size=2)
|
||
input_tensor = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3)
|
||
output = layer.forward(input_tensor)
|
||
|
||
# Check output shape
|
||
assert output.shape == (1, 2), f"Expected shape (1, 2), got {output.shape}"
|
||
print("✅ Output shape correct")
|
||
|
||
# Test case 2: No bias
|
||
layer_no_bias = Dense(input_size=2, output_size=3, use_bias=False)
|
||
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
|
||
print("✅ No bias option works")
|
||
|
||
# Test case 3: Multiple samples (batch processing)
|
||
batch_input = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) # Shape: (3, 2)
|
||
layer_batch = Dense(input_size=2, output_size=2)
|
||
batch_output = layer_batch.forward(batch_input)
|
||
|
||
assert batch_output.shape == (3, 2), f"Expected shape (3, 2), got {batch_output.shape}"
|
||
print("✅ Batch processing works")
|
||
|
||
# Test case 4: Callable interface
|
||
callable_output = layer_batch(batch_input)
|
||
assert np.allclose(callable_output.data, batch_output.data), "Callable interface should match forward()"
|
||
print("✅ Callable interface works")
|
||
|
||
# Test case 5: Parameter initialization
|
||
layer_init = Dense(input_size=10, output_size=5)
|
||
assert layer_init.weights.shape == (10, 5), f"Expected weights shape (10, 5), got {layer_init.weights.shape}"
|
||
assert layer_init.bias.shape == (5,), f"Expected bias shape (5,), got {layer_init.bias.shape}"
|
||
|
||
# Check that weights are reasonably small (good initialization)
|
||
assert np.abs(layer_init.weights.data).mean() < 1.0, "Weights should be small for good initialization"
|
||
print("✅ Parameter initialization correct")
|
||
|
||
print("🎉 All Dense layer tests passed!")
|
||
|
||
test_dense_layer()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Testing Autograd Integration
|
||
|
||
Now let's test that our Dense layer works correctly with parameter management and module composition.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-dense-parameter-management", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false}
|
||
def test_dense_parameter_management():
|
||
"""Test Dense layer parameter management and module composition."""
|
||
print("🧪 Testing Dense Layer Parameter Management...")
|
||
|
||
# Test case 1: Parameter registration
|
||
layer = Dense(input_size=3, output_size=2)
|
||
params = layer.parameters()
|
||
|
||
assert len(params) == 2, f"Expected 2 parameters (weights + bias), got {len(params)}"
|
||
assert layer.weights in params, "Weights should be in parameters list"
|
||
assert layer.bias in params, "Bias should be in parameters list"
|
||
print("✅ Parameter registration works")
|
||
|
||
# Test case 2: Module composition
|
||
class SimpleNetwork(Module):
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.layer1 = Dense(4, 3)
|
||
self.layer2 = Dense(3, 2)
|
||
|
||
def forward(self, x):
|
||
x = self.layer1(x)
|
||
return self.layer2(x)
|
||
|
||
network = SimpleNetwork()
|
||
all_params = network.parameters()
|
||
|
||
# Should have 4 parameters: 2 from each layer (weights + bias)
|
||
assert len(all_params) == 4, f"Expected 4 parameters from network, got {len(all_params)}"
|
||
print("✅ Module composition and parameter collection works")
|
||
|
||
# Test case 3: Forward pass through composed network
|
||
input_tensor = Tensor([[1.0, 2.0, 3.0, 4.0]])
|
||
output = network(input_tensor)
|
||
|
||
assert output.shape == (1, 2), f"Expected output shape (1, 2), got {output.shape}"
|
||
print("✅ Network forward pass works")
|
||
|
||
# Test case 4: Parameter shapes
|
||
layer = Dense(input_size=10, output_size=5)
|
||
assert layer.weights.shape == (10, 5), f"Expected weights shape (10, 5), got {layer.weights.shape}"
|
||
assert layer.bias.shape == (5,), f"Expected bias shape (5,), got {layer.bias.shape}"
|
||
print("✅ Parameter shapes correct")
|
||
|
||
# Test case 5: No bias option
|
||
layer_no_bias = Dense(input_size=3, output_size=2, use_bias=False)
|
||
params_no_bias = layer_no_bias.parameters()
|
||
|
||
assert len(params_no_bias) == 1, f"Expected 1 parameter (weights only), got {len(params_no_bias)}"
|
||
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
|
||
print("✅ No bias option works")
|
||
|
||
print("🎉 All Dense layer parameter management tests passed!")
|
||
|
||
test_dense_parameter_management()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Sequential Network Composition - Building Complete Architectures
|
||
|
||
Now that we have solid layers, let's build the Sequential network that composes layers into complete neural network architectures. This is the foundation for all neural networks from MLPs to complex deep learning models.
|
||
|
||
## Why Sequential Networks Matter
|
||
|
||
🏗️ **Architecture Foundation**: Sequential is the building block for all neural network architectures
|
||
🔄 **Function Composition**: Chain simple functions to create complex behaviors
|
||
📦 **Clean Interface**: Write networks as lists of layers - intuitive and maintainable
|
||
⚡ **Production Standard**: Every major framework uses this pattern for neural network construction
|
||
|
||
## Learning Objectives
|
||
By implementing Sequential networks, you'll understand:
|
||
- How function composition enables universal approximation in neural networks
|
||
- The architectural patterns that power everything from MLPs to transformers
|
||
- Why clean abstractions matter for building complex systems
|
||
- How layer composition creates the foundation for all modern deep learning
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "sequential-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Sequential(Module):
|
||
"""
|
||
Sequential Network: Composes layers in sequence.
|
||
|
||
The most fundamental network architecture that applies layers in order:
|
||
f(x) = layer_n(...layer_2(layer_1(x)))
|
||
|
||
Inherits from Module for automatic parameter collection from all sub-layers.
|
||
This enables optimizers to find all parameters automatically.
|
||
|
||
Example Usage:
|
||
# Create a 3-layer MLP
|
||
model = Sequential([
|
||
Linear(784, 128),
|
||
ReLU(),
|
||
Linear(128, 64),
|
||
ReLU(),
|
||
Linear(64, 10)
|
||
])
|
||
|
||
# Use the model
|
||
output = model(input_data) # Clean interface!
|
||
params = model.parameters() # All parameters from all layers!
|
||
"""
|
||
|
||
def __init__(self, layers=None):
|
||
"""
|
||
Initialize Sequential network with layers.
|
||
|
||
Args:
|
||
layers: List of layers to compose in order (optional)
|
||
"""
|
||
super().__init__() # Initialize Module base class
|
||
self.layers = layers if layers is not None else []
|
||
|
||
# Register all layers as sub-modules for parameter collection
|
||
for i, layer in enumerate(self.layers):
|
||
# This automatically adds each layer to self._modules
|
||
setattr(self, f'layer_{i}', layer)
|
||
|
||
def forward(self, x):
|
||
"""
|
||
Forward pass through all layers in sequence.
|
||
|
||
Args:
|
||
x: Input tensor
|
||
|
||
Returns:
|
||
Output tensor after passing through all layers
|
||
"""
|
||
for layer in self.layers:
|
||
x = layer(x)
|
||
return x
|
||
|
||
def add(self, layer):
|
||
"""Add a layer to the network."""
|
||
self.layers.append(layer)
|
||
# Register the new layer for parameter collection
|
||
setattr(self, f'layer_{len(self.layers)-1}', layer)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Testing Sequential Networks
|
||
|
||
Let's verify our Sequential network works correctly with comprehensive tests.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-sequential", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false}
|
||
def test_sequential_network():
|
||
"""Test Sequential network implementation."""
|
||
print("🧪 Testing Sequential Network...")
|
||
|
||
# Test case 1: Create empty network
|
||
empty_net = Sequential()
|
||
assert len(empty_net.layers) == 0, "Empty Sequential should have no layers"
|
||
print("✅ Empty Sequential network creation")
|
||
|
||
# Test case 2: Create network with layers
|
||
layers = [Dense(3, 4), Dense(4, 2)]
|
||
network = Sequential(layers)
|
||
assert len(network.layers) == 2, "Network should have 2 layers"
|
||
print("✅ Sequential network with layers")
|
||
|
||
# Test case 3: Forward pass through network
|
||
input_tensor = Tensor([[1.0, 2.0, 3.0]])
|
||
output = network(input_tensor)
|
||
assert output.shape == (1, 2), f"Expected output shape (1, 2), got {output.shape}"
|
||
print("✅ Forward pass through Sequential network")
|
||
|
||
# Test case 4: Parameter collection from all layers
|
||
all_params = network.parameters()
|
||
# Should have 4 parameters: 2 weights + 2 biases from 2 Dense layers
|
||
assert len(all_params) == 4, f"Expected 4 parameters from Sequential network, got {len(all_params)}"
|
||
print("✅ Parameter collection from all layers")
|
||
|
||
# Test case 5: Adding layers dynamically
|
||
network.add(Dense(2, 1))
|
||
assert len(network.layers) == 3, "Network should have 3 layers after adding one"
|
||
|
||
# Test forward pass after adding layer
|
||
final_output = network(input_tensor)
|
||
assert final_output.shape == (1, 1), f"Expected final output shape (1, 1), got {final_output.shape}"
|
||
print("✅ Dynamic layer addition")
|
||
|
||
print("🎉 All Sequential network tests passed!")
|
||
|
||
test_sequential_network()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Flatten Operation - Connecting Different Layer Types
|
||
|
||
The Flatten operation is essential for connecting convolutional layers to dense layers, or reshaping tensors between different network components. This is a fundamental operation in neural networks.
|
||
|
||
## Why Flatten Matters
|
||
|
||
🔗 **Interface Bridge**: Connects spatial layers (Conv2D) to dense layers (Linear)
|
||
📐 **Dimension Management**: Converts multi-dimensional tensors to vectors for different layer types
|
||
🏗️ **Architecture Flexibility**: Enables mixing different layer types in the same network
|
||
⚡ **Memory Efficiency**: Provides clean tensor reshaping without copying data
|
||
|
||
## Learning Objectives
|
||
By implementing Flatten, you'll understand:
|
||
- How neural networks handle tensors of different shapes between layer types
|
||
- The critical role of tensor reshaping in network architecture design
|
||
- How to preserve batch dimensions while flattening spatial dimensions
|
||
- The connection between memory layout and computational efficiency
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "flatten-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Flatten(Module):
|
||
"""
|
||
Flatten layer that reshapes tensors from multi-dimensional to 2D.
|
||
|
||
Essential for connecting convolutional layers (which output 4D tensors)
|
||
to linear layers (which expect 2D tensors). Preserves the batch dimension.
|
||
|
||
Example Usage:
|
||
# In a CNN architecture
|
||
model = Sequential([
|
||
Conv2D(3, 16, kernel_size=3), # Output: (batch, 16, height, width)
|
||
ReLU(),
|
||
Flatten(), # Output: (batch, 16*height*width)
|
||
Linear(16*height*width, 10) # Now compatible!
|
||
])
|
||
"""
|
||
|
||
def __init__(self, start_dim=1):
|
||
"""
|
||
Initialize Flatten layer.
|
||
|
||
Args:
|
||
start_dim: Dimension to start flattening from (default: 1 to preserve batch)
|
||
"""
|
||
super().__init__()
|
||
self.start_dim = start_dim
|
||
|
||
def forward(self, x):
|
||
"""
|
||
Flatten tensor starting from start_dim.
|
||
|
||
Args:
|
||
x: Input tensor
|
||
|
||
Returns:
|
||
Flattened tensor with batch dimension preserved
|
||
"""
|
||
return flatten(x, start_dim=self.start_dim)
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "flatten-function", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
def flatten(x, start_dim=1):
|
||
"""
|
||
Flatten tensor starting from a given dimension.
|
||
|
||
This is essential for transitioning from convolutional layers
|
||
(which output 4D tensors) to linear layers (which expect 2D).
|
||
|
||
Args:
|
||
x: Input tensor (Tensor or any array-like)
|
||
start_dim: Dimension to start flattening from (default: 1 to preserve batch)
|
||
|
||
Returns:
|
||
Flattened tensor preserving batch dimension
|
||
|
||
Examples:
|
||
# Flatten CNN output for Linear layer
|
||
conv_output = Tensor(np.random.randn(32, 64, 8, 8)) # (batch, channels, height, width)
|
||
flat = flatten(conv_output) # (32, 4096) - ready for Linear layer!
|
||
|
||
# Flatten image for MLP
|
||
images = Tensor(np.random.randn(32, 3, 28, 28)) # CIFAR-10 batch
|
||
flat = flatten(images) # (32, 2352) - ready for MLP!
|
||
"""
|
||
# Get the data (handle both Tensor and numpy arrays)
|
||
if hasattr(x, 'data'):
|
||
data = x.data
|
||
else:
|
||
data = x
|
||
|
||
# Calculate new shape
|
||
batch_size = data.shape[0] if start_dim > 0 else 1
|
||
remaining_size = np.prod(data.shape[start_dim:])
|
||
new_shape = (batch_size, remaining_size) if start_dim > 0 else (remaining_size,)
|
||
|
||
# Reshape preserving tensor type
|
||
if hasattr(x, 'data'):
|
||
# It's a Tensor - preserve type
|
||
flattened_data = data.reshape(new_shape)
|
||
return type(x)(flattened_data)
|
||
else:
|
||
# It's a numpy array
|
||
return data.reshape(new_shape)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Testing Flatten Operations
|
||
|
||
Let's verify our Flatten implementation works correctly with various tensor shapes.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-flatten", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false}
|
||
def test_flatten_operations():
|
||
"""Test Flatten layer and function implementation."""
|
||
print("🧪 Testing Flatten Operations...")
|
||
|
||
# Test case 1: Flatten function with 2D tensor
|
||
x_2d = Tensor([[1, 2], [3, 4]])
|
||
flattened_func = flatten(x_2d)
|
||
assert flattened_func.shape == (2, 2), f"Expected shape (2, 2), got {flattened_func.shape}"
|
||
print("✅ Flatten function with 2D tensor")
|
||
|
||
# Test case 2: Flatten function with 4D tensor (simulating CNN output)
|
||
x_4d = Tensor(np.random.randn(2, 3, 4, 4)) # (batch, channels, height, width)
|
||
flattened_4d = flatten(x_4d)
|
||
assert flattened_4d.shape == (2, 48), f"Expected shape (2, 48), got {flattened_4d.shape}" # 3*4*4 = 48
|
||
print("✅ Flatten function with 4D tensor")
|
||
|
||
# Test case 3: Flatten layer class
|
||
flatten_layer = Flatten()
|
||
layer_output = flatten_layer(x_4d)
|
||
assert layer_output.shape == (2, 48), f"Expected shape (2, 48), got {layer_output.shape}"
|
||
assert np.allclose(layer_output.data, flattened_4d.data), "Flatten layer should match flatten function"
|
||
print("✅ Flatten layer class")
|
||
|
||
# Test case 4: Different start dimensions
|
||
flatten_from_0 = Flatten(start_dim=0)
|
||
full_flat = flatten_from_0(x_2d)
|
||
assert len(full_flat.shape) <= 2, "Flattening from dim 0 should create vector"
|
||
print("✅ Different start dimensions")
|
||
|
||
# Test case 5: Integration with Sequential
|
||
network = Sequential([
|
||
Dense(8, 4),
|
||
Flatten()
|
||
])
|
||
test_input = Tensor(np.random.randn(2, 8))
|
||
output = network(test_input)
|
||
assert output.shape == (2, 4), f"Expected shape (2, 4), got {output.shape}"
|
||
print("✅ Flatten integration with Sequential")
|
||
|
||
print("🎉 All Flatten operations tests passed!")
|
||
|
||
test_flatten_operations()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Systems Analysis: Memory and Performance Characteristics
|
||
|
||
Let's analyze the memory usage and computational complexity of our layer implementations.
|
||
|
||
## Memory Analysis
|
||
- **Dense Layer Storage**: input_size × output_size weights + output_size bias terms
|
||
- **Forward Pass Memory**: Input tensor + weight tensor + output tensor (temporary storage)
|
||
- **Scaling Behavior**: Memory grows quadratically with layer size
|
||
|
||
## Computational Complexity
|
||
- **Matrix Multiplication**: O(batch_size × input_size × output_size)
|
||
- **Bias Addition**: O(batch_size × output_size)
|
||
- **Total**: Dominated by matrix multiplication for large layers
|
||
|
||
## Production Insights
|
||
In production ML systems:
|
||
- **Memory Management**: PyTorch uses memory pools to avoid frequent allocation/deallocation
|
||
- **Compute Optimization**: BLAS libraries (MKL, OpenBLAS) optimize matrix operations for specific hardware
|
||
- **GPU Acceleration**: CUDA kernels parallelize matrix operations across thousands of cores
|
||
- **Mixed Precision**: Using float16 instead of float32 can halve memory usage with minimal accuracy loss
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "memory-analysis", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
def analyze_layer_memory():
|
||
"""Analyze memory usage of different layer sizes."""
|
||
print("📊 Layer Memory Analysis")
|
||
print("=" * 40)
|
||
|
||
layer_sizes = [(10, 10), (100, 100), (1000, 1000), (784, 128), (128, 10)]
|
||
|
||
for input_size, output_size in layer_sizes:
|
||
# Calculate parameter count
|
||
weight_params = input_size * output_size
|
||
bias_params = output_size
|
||
total_params = weight_params + bias_params
|
||
|
||
# Calculate memory usage (assuming float32 = 4 bytes)
|
||
memory_mb = total_params * 4 / (1024 * 1024)
|
||
|
||
print(f" {input_size:4d} → {output_size:4d}: {total_params:,} params, {memory_mb:.3f} MB")
|
||
|
||
print("\n🔍 Key Insights:")
|
||
print(" • Memory grows quadratically with layer width")
|
||
print(" • Large layers (1000×1000) use significant memory")
|
||
print(" • Modern networks balance width vs depth for efficiency")
|
||
|
||
analyze_layer_memory()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# ML Systems Thinking: Interactive Questions
|
||
|
||
Let's explore the deeper implications of our layer implementations.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "systems-thinking", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
def explore_layer_scaling():
|
||
"""Explore how layer operations scale with size."""
|
||
print("🤔 Scaling Analysis: Matrix Multiplication Performance")
|
||
print("=" * 55)
|
||
|
||
sizes = [64, 128, 256, 512]
|
||
|
||
for size in sizes:
|
||
# Estimate FLOPs for square matrix multiplication
|
||
flops = 2 * size * size * size # 2 operations per multiply-add
|
||
|
||
# Estimate memory bandwidth (reading A, B, writing C)
|
||
memory_ops = 3 * size * size # Elements read/written
|
||
memory_mb = memory_ops * 4 / (1024 * 1024) # float32 = 4 bytes
|
||
|
||
print(f" Size {size:3d}×{size:3d}: {flops/1e6:.1f} MFLOPS, {memory_mb:.2f} MB transfers")
|
||
|
||
print("\n💡 Performance Insights:")
|
||
print(" • FLOPs grow cubically (O(n³)) with matrix size")
|
||
print(" • Memory bandwidth grows quadratically (O(n²))")
|
||
print(" • Large matrices become memory-bound, not compute-bound")
|
||
print(" • This is why GPUs excel: high memory bandwidth + parallel compute")
|
||
|
||
explore_layer_scaling()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Complete Neural Network Demo - All Components Working Together
|
||
|
||
Let's demonstrate how all our components work together to build complete neural networks.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "complete-network-demo", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
def demonstrate_complete_networks():
|
||
"""Demonstrate complete neural networks using all implemented components."""
|
||
print("🔥 Complete Neural Network Demo")
|
||
print("=" * 50)
|
||
|
||
print("\n1. MLP for Classification (MNIST-style):")
|
||
# Multi-layer perceptron for image classification
|
||
mlp = Sequential([
|
||
Flatten(), # Flatten input images
|
||
Linear(784, 256), # First hidden layer
|
||
Linear(256, 128), # Second hidden layer
|
||
Linear(128, 10) # Output layer (10 classes)
|
||
])
|
||
|
||
# Test with batch of "images"
|
||
batch_images = Tensor(np.random.randn(32, 28, 28)) # 32 MNIST-like images
|
||
mlp_output = mlp(batch_images)
|
||
print(f" Input: {batch_images.shape} (batch of 28x28 images)")
|
||
print(f" Output: {mlp_output.shape} (class logits for 32 images)")
|
||
print(f" Parameters: {len(mlp.parameters())} tensors")
|
||
|
||
print("\n2. CNN-style Architecture (with Flatten):")
|
||
# Simulate CNN → Flatten → Dense pattern
|
||
cnn_style = Sequential([
|
||
# Simulate Conv2D output with random "features"
|
||
Flatten(), # Flatten spatial features
|
||
Linear(512, 256), # Dense layer after convolution
|
||
Linear(256, 10) # Classification head
|
||
])
|
||
|
||
# Test with simulated conv output
|
||
conv_features = Tensor(np.random.randn(16, 8, 8, 8)) # Simulated (B,C,H,W)
|
||
cnn_output = cnn_style(conv_features)
|
||
print(f" Input: {conv_features.shape} (simulated conv features)")
|
||
print(f" Output: {cnn_output.shape} (class predictions)")
|
||
|
||
print("\n3. Deep Network with Many Layers:")
|
||
# Demonstrate deep composition
|
||
deep_net = Sequential()
|
||
layer_sizes = [100, 80, 60, 40, 20, 10]
|
||
|
||
for i in range(len(layer_sizes) - 1):
|
||
deep_net.add(Linear(layer_sizes[i], layer_sizes[i+1]))
|
||
print(f" Added layer: {layer_sizes[i]} → {layer_sizes[i+1]}")
|
||
|
||
# Test deep network
|
||
deep_input = Tensor(np.random.randn(8, 100))
|
||
deep_output = deep_net(deep_input)
|
||
print(f" Deep network: {deep_input.shape} → {deep_output.shape}")
|
||
print(f" Total parameters: {len(deep_net.parameters())} tensors")
|
||
|
||
print("\n4. Parameter Management Across Networks:")
|
||
networks = {'MLP': mlp, 'CNN-style': cnn_style, 'Deep': deep_net}
|
||
|
||
for name, net in networks.items():
|
||
params = net.parameters()
|
||
total_params = sum(p.data.size for p in params)
|
||
memory_mb = total_params * 4 / (1024 * 1024) # float32 = 4 bytes
|
||
print(f" {name}: {len(params)} param tensors, {total_params:,} total params, {memory_mb:.2f} MB")
|
||
|
||
print("\n🎉 All components work together seamlessly!")
|
||
print(" • Module system enables automatic parameter collection")
|
||
print(" • Linear layers handle matrix transformations")
|
||
print(" • Sequential composes layers into complete architectures")
|
||
print(" • Flatten connects different layer types")
|
||
print(" • Everything integrates for production-ready neural networks!")
|
||
|
||
demonstrate_complete_networks()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🤔 ML Systems Thinking: Interactive Questions
|
||
|
||
Now that you've implemented all the core neural network components, let's think about their implications for ML systems:
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "question-1", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
# Question 1: Memory vs Computation Trade-offs
|
||
"""
|
||
🤔 **Question 1: Memory vs Computation Analysis**
|
||
|
||
You're designing a neural network for deployment on a mobile device with limited memory (1GB RAM) but decent compute power.
|
||
|
||
You have two architecture options:
|
||
A) Wide network: 784 → 2048 → 2048 → 10 (3 layers, wide)
|
||
B) Deep network: 784 → 256 → 256 → 256 → 256 → 10 (5 layers, narrow)
|
||
|
||
Calculate the memory requirements for each option and explain which you'd choose for mobile deployment and why.
|
||
|
||
Consider:
|
||
- Parameter storage requirements
|
||
- Intermediate activation storage during forward pass
|
||
- Training vs inference memory requirements
|
||
- How your choice affects model capacity and accuracy
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "question-2", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
# Question 2: Performance Optimization
|
||
"""
|
||
🤔 **Question 2: Production Performance Optimization**
|
||
|
||
Your Dense layer implementation works correctly, but you notice it's slower than PyTorch's nn.Linear on the same hardware.
|
||
|
||
Investigate and explain:
|
||
1. Why might our implementation be slower? (Hint: think about underlying linear algebra libraries)
|
||
2. What optimization techniques do production frameworks use?
|
||
3. How would you modify our implementation to approach production performance?
|
||
4. When might our simple implementation actually be preferable?
|
||
|
||
Research areas to consider:
|
||
- BLAS (Basic Linear Algebra Subprograms) libraries
|
||
- Memory layout and cache efficiency
|
||
- Vectorization and SIMD instructions
|
||
- GPU kernel optimization
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "question-3", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
# Question 3: Scaling and Architecture Design
|
||
"""
|
||
🤔 **Question 3: Systems Architecture Scaling**
|
||
|
||
Modern transformer models like GPT-3 have billions of parameters, primarily in Dense layers.
|
||
|
||
Analyze the scaling challenges:
|
||
1. How does memory requirement scale with model size? Calculate the memory needed for a 175B parameter model.
|
||
2. What are the computational bottlenecks during training vs inference?
|
||
3. How do systems like distributed training address these scaling challenges?
|
||
4. Why do large models use techniques like gradient checkpointing and model parallelism?
|
||
|
||
Systems considerations:
|
||
- Memory hierarchy (L1/L2/L3 cache, RAM, storage)
|
||
- Network bandwidth for distributed training
|
||
- GPU memory constraints and model sharding
|
||
- Inference optimization for production serving
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Comprehensive Testing and Integration
|
||
|
||
Let's run a comprehensive test suite to verify all our implementations work correctly together.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "comprehensive-tests", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
def run_comprehensive_tests():
|
||
"""Run comprehensive tests of all layer functionality."""
|
||
print("🔬 Comprehensive Layer Testing Suite")
|
||
print("=" * 45)
|
||
|
||
# Test 1: Matrix multiplication edge cases
|
||
print("\n1. Matrix Multiplication Edge Cases:")
|
||
|
||
# Single element
|
||
a = Tensor([[5]])
|
||
b = Tensor([[3]])
|
||
result = matmul(a, b)
|
||
assert result.data[0, 0] == 15, "Single element multiplication failed"
|
||
print(" ✅ Single element multiplication")
|
||
|
||
# Identity matrix
|
||
identity = Tensor([[1, 0], [0, 1]])
|
||
test_matrix = Tensor([[2, 3], [4, 5]])
|
||
result = matmul(test_matrix, identity)
|
||
assert np.allclose(result.data, test_matrix.data), "Identity multiplication failed"
|
||
print(" ✅ Identity matrix multiplication")
|
||
|
||
# Test 2: Dense layer composition
|
||
print("\n2. Dense Layer Composition:")
|
||
|
||
# Create a simple 2-layer network
|
||
layer1 = Dense(4, 3)
|
||
layer2 = Dense(3, 2)
|
||
|
||
# Test data flow
|
||
input_data = Tensor([[1, 2, 3, 4]])
|
||
hidden = layer1(input_data)
|
||
output = layer2(hidden)
|
||
|
||
assert output.shape == (1, 2), f"Expected final output shape (1, 2), got {output.shape}"
|
||
print(" ✅ Multi-layer composition")
|
||
|
||
# Test 3: Batch processing
|
||
print("\n3. Batch Processing:")
|
||
|
||
batch_size = 10
|
||
batch_input = Tensor(np.random.randn(batch_size, 4))
|
||
batch_hidden = layer1(batch_input)
|
||
batch_output = layer2(batch_hidden)
|
||
|
||
assert batch_output.shape == (batch_size, 2), f"Expected batch output shape ({batch_size}, 2), got {batch_output.shape}"
|
||
print(" ✅ Batch processing")
|
||
|
||
# Test 4: Parameter access and modification
|
||
print("\n4. Parameter Management:")
|
||
|
||
layer = Dense(5, 3)
|
||
original_weights = layer.weights.data.copy()
|
||
|
||
# Simulate parameter update
|
||
layer.weights = Tensor(original_weights + 0.1)
|
||
|
||
assert not np.allclose(layer.weights.data, original_weights), "Parameter update failed"
|
||
print(" ✅ Parameter modification")
|
||
|
||
print("\n🎉 All comprehensive tests passed!")
|
||
print(" Your layer implementations are ready for neural network construction!")
|
||
|
||
run_comprehensive_tests()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Autograd Integration Demo
|
||
|
||
Let's demonstrate how layers compose together to build neural networks.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "layer-composition-demo", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
def demonstrate_layer_composition():
|
||
"""Demonstrate how layers compose into neural networks."""
|
||
print("🔥 Layer Composition Demo")
|
||
print("=" * 50)
|
||
|
||
print("\n1. Creating individual layers:")
|
||
layer1 = Dense(input_size=4, output_size=3)
|
||
layer2 = Dense(input_size=3, output_size=2)
|
||
|
||
print(f" Layer 1: {layer1.input_size} → {layer1.output_size}")
|
||
print(f" Layer 2: {layer2.input_size} → {layer2.output_size}")
|
||
|
||
print("\n2. Manual layer composition:")
|
||
input_data = Tensor([[1.0, 2.0, 3.0, 4.0]])
|
||
print(f" Input shape: {input_data.shape}")
|
||
|
||
# Forward pass through each layer
|
||
hidden = layer1(input_data)
|
||
print(f" After layer 1: {hidden.shape}")
|
||
|
||
output = layer2(hidden)
|
||
print(f" Final output: {output.shape}")
|
||
print(f" Output values: {output.data.tolist()}")
|
||
|
||
print("\n3. Creating a composed network class:")
|
||
class TwoLayerNetwork(Module):
|
||
def __init__(self, input_size, hidden_size, output_size):
|
||
super().__init__()
|
||
self.layer1 = Dense(input_size, hidden_size)
|
||
self.layer2 = Dense(hidden_size, output_size)
|
||
|
||
def forward(self, x):
|
||
x = self.layer1(x)
|
||
return self.layer2(x)
|
||
|
||
network = TwoLayerNetwork(4, 3, 2)
|
||
network_output = network(input_data)
|
||
|
||
print(f" Network output shape: {network_output.shape}")
|
||
print(f" Network output values: {network_output.data.tolist()}")
|
||
|
||
print("\n4. Parameter management:")
|
||
all_params = network.parameters()
|
||
print(f" Total parameters: {len(all_params)}")
|
||
|
||
total_param_count = 0
|
||
for i, param in enumerate(all_params):
|
||
total_param_count += param.data.size
|
||
print(f" Parameter {i}: shape {param.shape}, size {param.data.size}")
|
||
|
||
print(f" Total parameter count: {total_param_count:,}")
|
||
|
||
print("\n5. Batch processing:")
|
||
batch_input = Tensor([[1.0, 2.0, 3.0, 4.0],
|
||
[5.0, 6.0, 7.0, 8.0],
|
||
[9.0, 10.0, 11.0, 12.0]])
|
||
batch_output = network(batch_input)
|
||
|
||
print(f" Batch input shape: {batch_input.shape}")
|
||
print(f" Batch output shape: {batch_output.shape}")
|
||
print(" ✅ Batch processing works automatically!")
|
||
|
||
print("\n🎉 Layer composition enables building complex neural networks!")
|
||
print(" • Individual layers are building blocks")
|
||
print(" • Module class enables clean composition")
|
||
print(" • Parameter management happens automatically")
|
||
print(" • Batch processing scales to multiple samples")
|
||
|
||
demonstrate_layer_composition()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🎯 MODULE SUMMARY: Layers - Complete Neural Network Foundation
|
||
|
||
## 🎯 What You've Accomplished
|
||
|
||
You've successfully implemented the complete foundation for neural networks - all the essential components working together:
|
||
|
||
### ✅ **Complete Core System**
|
||
- **Module Base Class**: Parameter management and composition patterns for all neural network components
|
||
- **Matrix Multiplication**: The computational primitive underlying all neural network operations
|
||
- **Linear (Dense) Layers**: Complete implementation with proper parameter initialization and forward propagation
|
||
- **Sequential Networks**: Clean composition system for building complete neural network architectures
|
||
- **Flatten Operations**: Tensor reshaping to connect different layer types (essential for CNN→MLP transitions)
|
||
|
||
### ✅ **Systems Understanding**
|
||
- **Architectural Patterns**: How modular design enables everything from MLPs to complex deep networks
|
||
- **Memory Analysis**: How layer composition affects memory usage and computational efficiency
|
||
- **Performance Characteristics**: Understanding how tensor operations and layer composition affect performance
|
||
- **Production Context**: Connection to real-world ML frameworks and their component organization
|
||
|
||
### ✅ **ML Engineering Skills**
|
||
- **Complete Parameter Management**: How neural networks automatically collect parameters from all components
|
||
- **Network Composition**: Building complex architectures from simple, reusable components
|
||
- **Tensor Operations**: Essential reshaping and transformation operations for different network types
|
||
- **Clean Abstraction**: Professional software design patterns that scale to production systems
|
||
|
||
## 🔗 **Connection to Production ML Systems**
|
||
|
||
Your unified implementation mirrors the complete component systems used in:
|
||
- **PyTorch's nn.Module system**: Same parameter management and composition patterns
|
||
- **PyTorch's nn.Sequential**: Identical architecture composition approach
|
||
- **All major frameworks**: The same modular design principles that power TensorFlow, JAX, and others
|
||
- **Production ML systems**: Clean abstractions that enable complex models while maintaining manageable code
|
||
|
||
## 🚀 **What's Next**
|
||
|
||
With your complete layer foundation, you're ready to:
|
||
- **Add nonlinear activations** to enable complex function approximation
|
||
- **Implement loss functions** to define learning objectives
|
||
- **Build training algorithms** to optimize networks on data
|
||
- **Create specialized layers** like convolutions for computer vision
|
||
|
||
## 💡 **Key Systems Insights**
|
||
|
||
1. **Modular composition is the key to scalable ML systems** - clean interfaces enable complex behaviors
|
||
2. **Parameter management must be automatic** - manual parameter tracking doesn't scale to deep networks
|
||
3. **Tensor operations like flattening are architectural requirements** - different layer types need different tensor shapes
|
||
4. **Clean abstractions enable innovation** - good foundational design supports unlimited architectural experimentation
|
||
|
||
You now understand how to build complete, production-ready neural network foundations that can scale to any architecture!
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "final-demo", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
if __name__ == "__main__":
|
||
print("🔥 TinyTorch Layers Module - Complete Foundation Demo")
|
||
print("=" * 60)
|
||
|
||
# Test all core components
|
||
print("\n🧪 Testing All Core Components:")
|
||
test_matmul()
|
||
test_dense_layer()
|
||
test_dense_parameter_management()
|
||
test_sequential_network()
|
||
test_flatten_operations()
|
||
|
||
print("\n" + "="*60)
|
||
demonstrate_complete_networks()
|
||
|
||
print("\n" + "="*60)
|
||
demonstrate_layer_composition()
|
||
|
||
print("\n🎉 Complete neural network foundation ready!")
|
||
print(" ✅ Module system for parameter management")
|
||
print(" ✅ Linear layers for transformations")
|
||
print(" ✅ Sequential networks for composition")
|
||
print(" ✅ Flatten operations for tensor reshaping")
|
||
print(" ✅ All components tested and integrated!") |