mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-10 16:24:58 -05:00
Major refactoring: - Eliminated Variable class completely from autograd module - Implemented progressive enhancement pattern with enable_autograd() - All modules now use pure Tensor with requires_grad=True - PyTorch 2.0 compatible API throughout - Clean separation: Module 01 has simple Tensor, Module 05 enhances with gradients - Fixed all imports and references across layers, activations, losses - Educational clarity: students learn modern patterns from day one The system now follows the principle: 'One Tensor class to rule them all' No more confusion between Variable and Tensor - everything is just Tensor!
1205 lines
45 KiB
Plaintext
1205 lines
45 KiB
Plaintext
# ---
|
||
# jupyter:
|
||
# jupytext:
|
||
# text_representation:
|
||
# extension: .py
|
||
# format_name: percent
|
||
# format_version: '1.3'
|
||
# jupytext_version: 1.17.1
|
||
# ---
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Layers - Building Neural Network Architectures
|
||
|
||
Welcome to Layers! You'll implement the essential building blocks that compose into complete neural network architectures.
|
||
|
||
## LINK Building on Previous Learning
|
||
**What You Built Before**:
|
||
- Module 02 (Tensor): N-dimensional arrays with shape management and broadcasting
|
||
- Module 03 (Activations): ReLU and Softmax functions providing nonlinear intelligence
|
||
|
||
**What's Working**: You can create tensors and apply nonlinear transformations for complex pattern learning!
|
||
|
||
**The Gap**: You have data structures and nonlinear functions, but no way to combine them into trainable neural network architectures.
|
||
|
||
**This Module's Solution**: Implement Linear layers, Module composition patterns, and Sequential networks - the architectural foundations enabling everything from MLPs to transformers.
|
||
|
||
**Connection Map**:
|
||
```
|
||
Activations -> Layers -> Training
|
||
(intelligence) (architecture) (learning)
|
||
```
|
||
|
||
## Learning Objectives
|
||
|
||
By completing this module, you will:
|
||
|
||
1. **Build layer abstractions** - Create the building blocks that compose into neural networks
|
||
2. **Implement Linear layers** - The fundamental operation that transforms data between dimensions
|
||
3. **Create Sequential networks** - Chain layers together to build complete neural networks
|
||
4. **Manage parameters** - Handle weights and biases in an organized way
|
||
5. **Foundation for architectures** - Enable building everything from simple MLPs to complex models
|
||
|
||
## Build -> Use -> Reflect
|
||
1. **Build**: Module base class, Linear layers, and Sequential composition
|
||
2. **Use**: Combine layers into complete neural networks with real data
|
||
3. **Reflect**: Understand how simple building blocks enable complex architectures
|
||
"""
|
||
|
||
# In[ ]:
|
||
|
||
#| default_exp core.layers
|
||
|
||
#| export
|
||
import numpy as np
|
||
import sys
|
||
import os
|
||
|
||
# Smart import system: works both during development and in production
|
||
# This pattern allows the same code to work in two scenarios:
|
||
# 1. During development: imports from local module files (tensor_dev.py)
|
||
# 2. In production: imports from installed tinytorch package
|
||
# This flexibility is essential for educational development workflows
|
||
|
||
if 'tinytorch' in sys.modules:
|
||
# Production: Import from installed package
|
||
# When tinytorch is installed as a package, use the packaged version
|
||
from tinytorch.core.tensor import Tensor
|
||
else:
|
||
# Development: Import from local module files
|
||
# During development, we need to import directly from the source files
|
||
# This allows us to work with modules before they're packaged
|
||
tensor_module_path = os.path.join(os.path.dirname(__file__), '..', '01_tensor')
|
||
sys.path.insert(0, tensor_module_path)
|
||
try:
|
||
from tensor_dev import Tensor
|
||
finally:
|
||
sys.path.pop(0) # Always clean up path to avoid side effects
|
||
|
||
# CRITICAL FIX: Parameter must be Variable-based for gradient tracking
|
||
class Parameter:
|
||
"""
|
||
A trainable parameter that supports automatic differentiation.
|
||
|
||
This creates a Variable with requires_grad=True for use as neural network parameters.
|
||
Essential for gradient-based optimization of weights and biases.
|
||
|
||
IMPORTANT: Parameters must participate in autograd for training to work.
|
||
"""
|
||
def __init__(self, data):
|
||
# Import Variable locally to avoid circular imports
|
||
# NO Variable imports - using pure Tensor system only!
|
||
|
||
# Use pure Tensor with gradients enabled
|
||
from tinytorch.core.tensor import Tensor
|
||
|
||
if isinstance(data, Tensor):
|
||
self._tensor = data
|
||
if not data.requires_grad:
|
||
# Ensure parameters always require gradients
|
||
data.requires_grad = True
|
||
else:
|
||
# Convert data to Tensor with gradient tracking
|
||
self._tensor = Tensor(data, requires_grad=True)
|
||
|
||
def __getattr__(self, name):
|
||
"""Delegate all attribute access to the underlying Tensor."""
|
||
return getattr(self._tensor, name)
|
||
|
||
def __setattr__(self, name, value):
|
||
"""Handle setting attributes."""
|
||
if name == '_tensor':
|
||
super().__setattr__(name, value)
|
||
else:
|
||
# Delegate to underlying Tensor
|
||
setattr(self._tensor, name, value)
|
||
|
||
@property
|
||
def data(self):
|
||
"""Access to underlying data."""
|
||
return self._tensor.data
|
||
|
||
@property
|
||
def grad(self):
|
||
"""Access to gradient."""
|
||
return self._tensor.grad
|
||
|
||
@grad.setter
|
||
def grad(self, value):
|
||
"""Set gradient."""
|
||
self._tensor.grad = value
|
||
|
||
@property
|
||
def requires_grad(self):
|
||
"""Whether this parameter requires gradients."""
|
||
return self._tensor.requires_grad
|
||
|
||
def backward(self, gradient=None):
|
||
"""Backpropagate gradients."""
|
||
return self._tensor.backward(gradient)
|
||
|
||
def __repr__(self):
|
||
return f"Parameter({self._tensor})"
|
||
|
||
# In[ ]:
|
||
|
||
print("FIRE TinyTorch Layers Module")
|
||
print(f"NumPy version: {np.__version__}")
|
||
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
|
||
print("Ready to build neural network layers!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Visual Guide: Understanding Neural Network Architecture Through Diagrams
|
||
|
||
### Neural Network Layers: From Components to Systems
|
||
|
||
```
|
||
Individual Neuron: Neural Network Layer:
|
||
x₁ --○ w₁ +---------------------+
|
||
\\ | Input Vector |
|
||
x₂ --○ w₂ --> Sum --> f() --> y | [x₁, x₂, x₃] |
|
||
/ +---------------------+
|
||
x₃ --○ w₃ v
|
||
+ bias +---------------------+
|
||
| Weight Matrix W |
|
||
One computation unit | +w₁₁ w₁₂ w₁₃+ |
|
||
| |w₂₁ w₂₂ w₂₃| |
|
||
| +w₃₁ w₃₂ w₃₃+ |
|
||
+---------------------+
|
||
v
|
||
Matrix multiplication
|
||
Y = X @ W + b
|
||
v
|
||
+---------------------+
|
||
| Output Vector |
|
||
| [y₁, y₂, y₃] |
|
||
+---------------------+
|
||
|
||
Parallel processing of many neurons!
|
||
```
|
||
|
||
### Layer Composition: Building Complex Architectures
|
||
|
||
```
|
||
Multi-Layer Perceptron (MLP) Architecture:
|
||
|
||
Input Hidden Layer 1 Hidden Layer 2 Output
|
||
(784 dims) (256 neurons) (128 neurons) (10 classes)
|
||
+---------+ +-------------+ +-------------+ +---------+
|
||
| Image |----▶| ReLU |--▶| ReLU |--▶| Softmax |
|
||
| 28*28px | | Activations | | Activations | | Probs |
|
||
+---------+ +-------------+ +-------------+ +---------+
|
||
v v v v
|
||
200,960 params 32,896 params 1,290 params Total: 235,146
|
||
|
||
Parameter calculation for Linear(input_size, output_size):
|
||
• Weights: input_size * output_size matrix
|
||
• Biases: output_size vector
|
||
• Total: (input_size * output_size) + output_size
|
||
|
||
Memory scaling pattern:
|
||
Layer width doubles -> Parameters quadruple -> Memory quadruples
|
||
```
|
||
|
||
### Module System: Automatic Parameter Management
|
||
|
||
```
|
||
Parameter Collection Hierarchy:
|
||
|
||
Model (Sequential)
|
||
+-- Layer1 (Linear)
|
||
| +-- weights [784 * 256] --+
|
||
| +-- bias [256] --┤
|
||
+-- Layer2 (Linear) +--▶ model.parameters()
|
||
| +-- weights [256 * 128] --┤ Automatically collects
|
||
| +-- bias [128] --┤ all parameters for
|
||
+-- Layer3 (Linear) +--▶ optimizer.step()
|
||
+-- weights [128 * 10] --┤
|
||
+-- bias [10] --+
|
||
|
||
Before Module system: With Module system:
|
||
manually track params -> automatic collection
|
||
params = [w1, b1, w2,...] params = model.parameters()
|
||
|
||
Enables: optimizer = Adam(model.parameters())
|
||
```
|
||
|
||
### Memory Layout and Performance Implications
|
||
|
||
```
|
||
Tensor Memory Access Patterns:
|
||
|
||
Matrix Multiplication: A @ B = C
|
||
|
||
Efficient (Row-major access): Inefficient (Column-major):
|
||
A: --------------▶ A: | | | | | ▶
|
||
Cache-friendly | | | | |
|
||
Sequential reads v v v v v
|
||
Cache misses
|
||
B: | B: --------------▶
|
||
|
|
||
v
|
||
|
||
Performance impact:
|
||
• Good memory layout: 100% cache hit ratio
|
||
• Poor memory layout: 10-50% cache hit ratio
|
||
• 10-100x performance difference in practice
|
||
|
||
Why contiguous tensors matter in production!
|
||
```
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 1: Module Base Class - The Foundation of Neural Network Architecture
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "module-base", "solution": true}
|
||
|
||
# Before building specific layers, we need a base class that enables clean composition and automatic parameter management.
|
||
|
||
#| export
|
||
class Module:
|
||
"""
|
||
Base class for all neural network modules.
|
||
|
||
Provides automatic parameter collection, forward pass management,
|
||
and clean composition patterns. All layers (Dense, Conv2d, etc.)
|
||
inherit from this class.
|
||
|
||
Key Features:
|
||
- Automatic parameter registration when you assign parameter Tensors (weights, bias)
|
||
- Recursive parameter collection from sub-modules
|
||
- Clean __call__ interface: model(x) instead of model.forward(x)
|
||
- Extensible for custom layers
|
||
|
||
Example Usage:
|
||
class MLP(Module):
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.layer1 = Linear(784, 128) # Auto-registered!
|
||
self.layer2 = Linear(128, 10) # Auto-registered!
|
||
|
||
def forward(self, x):
|
||
x = self.layer1(x)
|
||
return self.layer2(x)
|
||
|
||
model = MLP()
|
||
params = model.parameters() # Gets all parameters automatically!
|
||
output = model(input) # Clean interface!
|
||
"""
|
||
|
||
def __init__(self):
|
||
"""Initialize module with empty parameter and sub-module storage."""
|
||
self._parameters = []
|
||
self._modules = []
|
||
|
||
def __setattr__(self, name, value):
|
||
"""
|
||
Intercept attribute assignment to auto-register parameters and modules.
|
||
|
||
When you do self.weight = Parameter(...), this automatically adds
|
||
the parameter to our collection for easy optimization.
|
||
"""
|
||
# Step 1: Check if this looks like a parameter (Tensor with data and specific name)
|
||
# Break down the complex boolean logic for clarity:
|
||
is_tensor_like = hasattr(value, 'data') and hasattr(value, 'shape')
|
||
is_tensor_type = isinstance(value, Tensor)
|
||
is_parameter_type = isinstance(value, Parameter)
|
||
is_parameter_name = name in ['weights', 'weight', 'bias']
|
||
|
||
if is_tensor_like and (is_tensor_type or is_parameter_type) and is_parameter_name:
|
||
# Step 2: Add to our parameter list for optimization
|
||
self._parameters.append(value)
|
||
|
||
# Step 3: Check if it's a sub-module (another neural network layer)
|
||
elif isinstance(value, Module):
|
||
# Step 4: Add to module list for recursive parameter collection
|
||
self._modules.append(value)
|
||
|
||
# Step 5: Always set the actual attribute (this is essential!)
|
||
super().__setattr__(name, value)
|
||
|
||
def parameters(self):
|
||
"""
|
||
Recursively collect all parameters from this module and sub-modules.
|
||
|
||
Returns:
|
||
List of all parameters (Tensors containing weights and biases)
|
||
|
||
This enables: optimizer = Adam(model.parameters()) (when optimizers are available)
|
||
"""
|
||
# Start with our own parameters
|
||
params = list(self._parameters)
|
||
|
||
# Add parameters from sub-modules recursively
|
||
for module in self._modules:
|
||
params.extend(module.parameters())
|
||
|
||
return params
|
||
|
||
def __call__(self, *args, **kwargs):
|
||
"""
|
||
Makes modules callable: model(x) instead of model.forward(x).
|
||
|
||
This is the magic that enables clean syntax like:
|
||
output = model(input)
|
||
instead of:
|
||
output = model.forward(input)
|
||
"""
|
||
return self.forward(*args, **kwargs)
|
||
|
||
def forward(self, *args, **kwargs):
|
||
"""
|
||
Forward pass - must be implemented by subclasses.
|
||
|
||
This is where the actual computation happens. Every layer
|
||
defines its own forward() method.
|
||
"""
|
||
raise NotImplementedError("Subclasses must implement forward()")
|
||
|
||
# In[ ]:
|
||
|
||
# PASS IMPLEMENTATION CHECKPOINT: Basic Module class complete
|
||
|
||
# THINK PREDICTION: How many parameters would a simple 3-layer network have?
|
||
# Write your guess here: _______
|
||
|
||
# 🔍 SYSTEMS ANALYSIS: Layer Performance and Scaling
|
||
def analyze_layer_performance():
|
||
"""Analyze layer performance and scaling characteristics."""
|
||
print("📊 LAYER SYSTEMS ANALYSIS")
|
||
print("Understanding how neural network layers scale and perform...")
|
||
|
||
try:
|
||
# Parameter scaling analysis
|
||
print("\n1. Parameter Scaling:")
|
||
layer_sizes = [(784, 256), (256, 128), (128, 10)]
|
||
total_params = 0
|
||
|
||
for i, (input_size, output_size) in enumerate(layer_sizes):
|
||
weights = input_size * output_size
|
||
biases = output_size
|
||
layer_params = weights + biases
|
||
total_params += layer_params
|
||
print(f" Layer {i+1} ({input_size}→{output_size}): {layer_params:,} params")
|
||
|
||
print(f" Total network: {total_params:,} parameters")
|
||
print(f" Memory usage: {total_params * 4 / 1024 / 1024:.2f} MB (float32)")
|
||
|
||
# Computational complexity
|
||
print("\n2. Computational Complexity:")
|
||
batch_size = 32
|
||
total_flops = 0
|
||
|
||
for i, (input_size, output_size) in enumerate(layer_sizes):
|
||
matmul_flops = 2 * batch_size * input_size * output_size
|
||
bias_flops = batch_size * output_size
|
||
layer_flops = matmul_flops + bias_flops
|
||
total_flops += layer_flops
|
||
print(f" Layer {i+1}: {layer_flops:,} FLOPs ({matmul_flops:,} matmul + {bias_flops:,} bias)")
|
||
|
||
print(f" Total forward pass: {total_flops:,} FLOPs")
|
||
|
||
# Scaling patterns
|
||
print("\n3. Scaling Insights:")
|
||
print(" • Parameter growth: O(input_size × output_size) - quadratic")
|
||
print(" • Computation: O(batch × input × output) - linear in each dimension")
|
||
print(" • Memory: Parameters + activations scale differently")
|
||
print(" • Bottlenecks: Large layers dominate both memory and compute")
|
||
|
||
print("\n💡 KEY INSIGHT: Layer size quadratically affects parameters but linearly affects computation per sample")
|
||
|
||
except Exception as e:
|
||
print(f"⚠️ Analysis error: {e}")
|
||
|
||
# In[ ]:
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### ✅ IMPLEMENTATION CHECKPOINT: Module Base Class Complete
|
||
|
||
You've built the foundation that enables automatic parameter management across all neural network components!
|
||
|
||
🤔 **PREDICTION**: How many parameters would a simple 3-layer network have?
|
||
Network: 784 → 256 → 128 → 10
|
||
Your guess: _______
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 2: Linear Layer - The Fundamental Neural Network Component
|
||
|
||
Linear layers (also called Dense or Fully Connected layers) are the building blocks of neural networks.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "linear-layer", "solution": true}
|
||
|
||
#| export
|
||
class Linear(Module):
|
||
"""
|
||
Linear (Fully Connected) Layer implementation.
|
||
|
||
Applies the transformation: output = input @ weights + bias
|
||
|
||
Inherits from Module for automatic parameter management and clean API.
|
||
This is PyTorch's nn.Linear equivalent with the same name for familiarity.
|
||
|
||
Features:
|
||
- Automatic parameter registration (weights and bias)
|
||
- Clean call interface: layer(input) instead of layer.forward(input)
|
||
- Works with optimizers via model.parameters()
|
||
"""
|
||
|
||
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
|
||
"""
|
||
Initialize Linear layer with random weights and optional bias.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output features
|
||
use_bias: Whether to include bias term
|
||
|
||
TODO: Implement Linear layer initialization.
|
||
|
||
STEP-BY-STEP IMPLEMENTATION:
|
||
1. Store input_size and output_size as instance variables
|
||
2. Initialize weights as Tensor with shape (input_size, output_size)
|
||
3. Use small random values: np.random.randn(...) * 0.1
|
||
4. Initialize bias as Tensor with shape (output_size,) if use_bias is True
|
||
5. Set bias to None if use_bias is False
|
||
|
||
LEARNING CONNECTIONS:
|
||
- Small random initialization prevents symmetry breaking
|
||
- Weight shape (input_size, output_size) enables matrix multiplication
|
||
- Bias allows shifting the output (like y-intercept in linear regression)
|
||
- PyTorch uses more sophisticated initialization (Xavier, Kaiming)
|
||
|
||
IMPLEMENTATION HINTS:
|
||
- Use np.random.randn() for Gaussian random numbers
|
||
- Scale by 0.1 to keep initial values small
|
||
- Remember to wrap numpy arrays in Tensor()
|
||
- Store use_bias flag for forward pass logic
|
||
"""
|
||
### BEGIN SOLUTION
|
||
super().__init__() # Initialize Module base class
|
||
|
||
self.input_size = input_size
|
||
self.output_size = output_size
|
||
self.use_bias = use_bias
|
||
|
||
# Initialize weights with small random values using Parameter
|
||
# Shape: (input_size, output_size) for matrix multiplication
|
||
#
|
||
# MAGNIFY WEIGHT INITIALIZATION CONTEXT:
|
||
# Weight initialization is critical for training deep networks successfully.
|
||
# Our simple approach (small random * 0.1) works for shallow networks, but
|
||
# deeper networks require more sophisticated initialization strategies:
|
||
#
|
||
# • Xavier/Glorot: scale = sqrt(1/fan_in) - good for tanh/sigmoid activations
|
||
# • Kaiming/He: scale = sqrt(2/fan_in) - optimized for ReLU activations
|
||
# • Our approach: scale = 0.1 - simple but effective for basic networks
|
||
#
|
||
# Why proper initialization matters:
|
||
# - Prevents vanishing gradients (weights too small -> signals disappear)
|
||
# - Prevents exploding gradients (weights too large -> signals blow up)
|
||
# - Enables stable training in deeper architectures (Module 11 training)
|
||
# - Affects convergence speed and final model performance
|
||
#
|
||
# Production frameworks automatically choose initialization based on layer type!
|
||
weight_data = np.random.randn(input_size, output_size) * 0.1
|
||
self.weights = Parameter(weight_data) # Auto-registers for optimization!
|
||
|
||
# Initialize bias if requested
|
||
if use_bias:
|
||
# MAGNIFY GRADIENT FLOW PREPARATION:
|
||
# Clean parameter management is essential for backpropagation (Module 09).
|
||
# When we implement autograd, the optimizer needs to find ALL trainable
|
||
# parameters automatically. Our Module base class ensures that:
|
||
#
|
||
# • Parameters are automatically registered when assigned
|
||
# • Recursive parameter collection works through network hierarchies
|
||
# • Gradient updates can flow to all learnable weights and biases
|
||
# • Memory management handles parameter lifecycle correctly
|
||
#
|
||
# This design enables the autograd system to:
|
||
# - Track computational graphs through all layers
|
||
# - Accumulate gradients for each parameter during backpropagation
|
||
# - Support optimizers that update parameters based on gradients
|
||
# - Scale to arbitrarily deep and complex network architectures
|
||
#
|
||
# Bias also uses small random initialization (could be zeros, but small random works well)
|
||
bias_data = np.random.randn(output_size) * 0.1
|
||
self.bias = Parameter(bias_data) # Auto-registers for optimization!
|
||
else:
|
||
self.bias = None
|
||
### END SOLUTION
|
||
|
||
def forward(self, x):
|
||
"""
|
||
Forward pass through the Linear layer with automatic differentiation.
|
||
|
||
Args:
|
||
x: Input Variable (shape: ..., input_size)
|
||
|
||
Returns:
|
||
Output Variable (shape: ..., output_size) with gradient tracking
|
||
|
||
CRITICAL FIX: This method now properly uses autograd operations
|
||
to ensure gradients flow through parameters during backpropagation.
|
||
|
||
TODO: Implement the linear transformation using autograd operations
|
||
|
||
STEP-BY-STEP IMPLEMENTATION:
|
||
1. Convert input to Variable if needed (with gradient tracking)
|
||
2. Use autograd matrix multiplication: matmul(x, weights)
|
||
3. Add bias using autograd addition if it exists: add(result, bias)
|
||
4. Return Variable with gradient tracking enabled
|
||
|
||
LEARNING CONNECTIONS:
|
||
- Uses autograd operations instead of raw numpy for gradient flow
|
||
- Parameters (weights/bias) are Variables with requires_grad=True
|
||
- Matrix multiplication and addition maintain computational graph
|
||
- This enables backpropagation through all parameters
|
||
|
||
IMPLEMENTATION HINTS:
|
||
- Import autograd operations locally to avoid circular imports
|
||
- Ensure result Variable has proper gradient tracking
|
||
- Handle both Tensor and Variable inputs gracefully
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Use pure Tensor operations - NO Variables!
|
||
from tinytorch.core.tensor import Tensor
|
||
|
||
# Ensure input is a Tensor
|
||
if not isinstance(x, Tensor):
|
||
x = Tensor(x.data if hasattr(x, 'data') else x)
|
||
|
||
# Matrix multiplication: x @ weights
|
||
# Use Tensor's matmul which should track gradients
|
||
result = x.matmul(self.weights)
|
||
|
||
# Add bias if it exists
|
||
if self.bias is not None:
|
||
result = result + self.bias
|
||
|
||
# Return pure Tensor with gradient tracking preserved
|
||
return result
|
||
### END SOLUTION
|
||
|
||
# In[ ]:
|
||
|
||
# TEST Unit Test: Linear Layer
|
||
def test_unit_linear():
|
||
"""Test Linear layer implementation."""
|
||
print("TEST Testing Linear Layer...")
|
||
|
||
# Test case 1: Basic functionality
|
||
layer = Linear(input_size=3, output_size=2)
|
||
input_tensor = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3)
|
||
output = layer.forward(input_tensor)
|
||
|
||
# Check output shape
|
||
assert output.shape == (1, 2), f"Expected shape (1, 2), got {output.shape}"
|
||
print("PASS Output shape correct")
|
||
|
||
# Test case 2: No bias
|
||
layer_no_bias = Linear(input_size=2, output_size=3, use_bias=False)
|
||
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
|
||
print("PASS No bias option works")
|
||
|
||
# Test case 3: Multiple samples (batch processing)
|
||
batch_input = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) # Shape: (3, 2)
|
||
layer_batch = Linear(input_size=2, output_size=2)
|
||
batch_output = layer_batch.forward(batch_input)
|
||
|
||
assert batch_output.shape == (3, 2), f"Expected shape (3, 2), got {batch_output.shape}"
|
||
print("PASS Batch processing works")
|
||
|
||
# Test case 4: Callable interface
|
||
callable_output = layer_batch(batch_input)
|
||
assert np.allclose(callable_output.data, batch_output.data), "Callable interface should match forward()"
|
||
print("PASS Callable interface works")
|
||
|
||
# Test case 5: Parameter initialization
|
||
layer_init = Linear(input_size=10, output_size=5)
|
||
assert layer_init.weights.shape == (10, 5), f"Expected weights shape (10, 5), got {layer_init.weights.shape}"
|
||
assert layer_init.bias.shape == (5,), f"Expected bias shape (5,), got {layer_init.bias.shape}"
|
||
|
||
# Check that weights are reasonably small (good initialization)
|
||
mean_val = np.abs(layer_init.weights.data).mean()
|
||
# Convert to float if it's a Tensor
|
||
if hasattr(mean_val, 'item'):
|
||
mean_val = mean_val.item()
|
||
elif hasattr(mean_val, 'data'):
|
||
mean_val = float(mean_val.data)
|
||
assert mean_val < 1.0, "Weights should be small for good initialization"
|
||
print("PASS Parameter initialization correct")
|
||
|
||
print("CELEBRATE All Linear layer tests passed!")
|
||
|
||
test_unit_linear()
|
||
|
||
# In[ ]:
|
||
|
||
# TEST Unit Test: Parameter Management
|
||
def test_unit_parameter_management():
|
||
"""Test Linear layer parameter management and module composition."""
|
||
print("TEST Testing Parameter Management...")
|
||
|
||
# Test case 1: Parameter registration
|
||
layer = Linear(input_size=3, output_size=2)
|
||
params = layer.parameters()
|
||
|
||
assert len(params) == 2, f"Expected 2 parameters (weights + bias), got {len(params)}"
|
||
assert layer.weights in params, "Weights should be in parameters list"
|
||
assert layer.bias in params, "Bias should be in parameters list"
|
||
print("PASS Parameter registration works")
|
||
|
||
# Test case 2: Module composition
|
||
class SimpleNetwork(Module):
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.layer1 = Linear(4, 3)
|
||
self.layer2 = Linear(3, 2)
|
||
|
||
def forward(self, x):
|
||
x = self.layer1(x)
|
||
return self.layer2(x)
|
||
|
||
network = SimpleNetwork()
|
||
all_params = network.parameters()
|
||
|
||
# Should have 4 parameters: 2 from each layer (weights + bias)
|
||
assert len(all_params) == 4, f"Expected 4 parameters from network, got {len(all_params)}"
|
||
print("PASS Module composition and parameter collection works")
|
||
|
||
# Test case 3: Forward pass through composed network
|
||
input_tensor = Tensor([[1.0, 2.0, 3.0, 4.0]])
|
||
output = network(input_tensor)
|
||
|
||
assert output.shape == (1, 2), f"Expected output shape (1, 2), got {output.shape}"
|
||
print("PASS Network forward pass works")
|
||
|
||
# Test case 4: No bias option
|
||
layer_no_bias = Linear(input_size=3, output_size=2, use_bias=False)
|
||
params_no_bias = layer_no_bias.parameters()
|
||
|
||
assert len(params_no_bias) == 1, f"Expected 1 parameter (weights only), got {len(params_no_bias)}"
|
||
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
|
||
print("PASS No bias option works")
|
||
|
||
print("CELEBRATE All parameter management tests passed!")
|
||
|
||
test_unit_parameter_management()
|
||
|
||
# In[ ]:
|
||
|
||
# PASS IMPLEMENTATION CHECKPOINT: Linear layer complete
|
||
|
||
# THINK PREDICTION: How does memory usage scale with network depth vs width?
|
||
# Deeper network (more layers): _______
|
||
# Wider network (more neurons per layer): _______
|
||
|
||
# MAGNIFY SYSTEMS INSIGHT #3: Architecture Memory Analysis
|
||
# Architecture analysis consolidated into analyze_layer_performance() above
|
||
|
||
# Analysis consolidated into analyze_layer_performance() above
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 4: Sequential Network Composition
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "sequential-composition", "solution": true}
|
||
|
||
#| export
|
||
class Sequential(Module):
|
||
"""
|
||
Sequential Network: Composes layers in sequence.
|
||
|
||
The most fundamental network architecture that applies layers in order:
|
||
f(x) = layer_n(...layer_2(layer_1(x)))
|
||
|
||
Inherits from Module for automatic parameter collection from all sub-layers.
|
||
This enables optimizers to find all parameters automatically.
|
||
|
||
Example Usage:
|
||
# Create a 3-layer MLP
|
||
model = Sequential([
|
||
Linear(784, 128),
|
||
ReLU(),
|
||
Linear(128, 64),
|
||
ReLU(),
|
||
Linear(64, 10)
|
||
])
|
||
|
||
# Use the model
|
||
output = model(input_data) # Clean interface!
|
||
params = model.parameters() # All parameters from all layers!
|
||
"""
|
||
|
||
def __init__(self, layers=None):
|
||
"""
|
||
Initialize Sequential network with layers.
|
||
|
||
Args:
|
||
layers: List of layers to compose in order (optional)
|
||
"""
|
||
super().__init__() # Initialize Module base class
|
||
self.layers = layers if layers is not None else []
|
||
|
||
# Register all layers as sub-modules for parameter collection
|
||
for i, layer in enumerate(self.layers):
|
||
# This automatically adds each layer to self._modules
|
||
setattr(self, f'layer_{i}', layer)
|
||
|
||
def forward(self, x):
|
||
"""
|
||
Forward pass through all layers in sequence.
|
||
|
||
Args:
|
||
x: Input tensor
|
||
|
||
Returns:
|
||
Output tensor after passing through all layers
|
||
"""
|
||
for layer in self.layers:
|
||
x = layer(x)
|
||
return x
|
||
|
||
def add(self, layer):
|
||
"""Add a layer to the network."""
|
||
self.layers.append(layer)
|
||
# Register the new layer for parameter collection
|
||
setattr(self, f'layer_{len(self.layers)-1}', layer)
|
||
|
||
# In[ ]:
|
||
|
||
# TEST Unit Test: Sequential Networks
|
||
def test_unit_sequential():
|
||
"""Test Sequential network implementation."""
|
||
print("TEST Testing Sequential Network...")
|
||
|
||
# Test case 1: Create empty network
|
||
empty_net = Sequential()
|
||
assert len(empty_net.layers) == 0, "Empty Sequential should have no layers"
|
||
print("PASS Empty Sequential network creation")
|
||
|
||
# Test case 2: Create network with layers
|
||
layers = [Linear(3, 4), Linear(4, 2)]
|
||
network = Sequential(layers)
|
||
assert len(network.layers) == 2, "Network should have 2 layers"
|
||
print("PASS Sequential network with layers")
|
||
|
||
# Test case 3: Forward pass through network
|
||
input_tensor = Tensor([[1.0, 2.0, 3.0]])
|
||
output = network(input_tensor)
|
||
assert output.shape == (1, 2), f"Expected output shape (1, 2), got {output.shape}"
|
||
print("PASS Forward pass through Sequential network")
|
||
|
||
# Test case 4: Parameter collection from all layers
|
||
all_params = network.parameters()
|
||
# Should have 4 parameters: 2 weights + 2 biases from 2 Linear layers
|
||
assert len(all_params) == 4, f"Expected 4 parameters from Sequential network, got {len(all_params)}"
|
||
print("PASS Parameter collection from all layers")
|
||
|
||
# Test case 5: Adding layers dynamically
|
||
network.add(Linear(2, 1))
|
||
assert len(network.layers) == 3, "Network should have 3 layers after adding one"
|
||
|
||
# Test forward pass after adding layer
|
||
final_output = network(input_tensor)
|
||
assert final_output.shape == (1, 1), f"Expected final output shape (1, 1), got {final_output.shape}"
|
||
print("PASS Dynamic layer addition")
|
||
|
||
print("CELEBRATE All Sequential network tests passed!")
|
||
|
||
test_unit_sequential()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 5: Flatten Operation - Connecting Different Layer Types
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "flatten-operations", "solution": true}
|
||
|
||
#| export
|
||
def flatten(x, start_dim=1):
|
||
"""
|
||
Flatten tensor starting from a given dimension.
|
||
|
||
This is essential for transitioning from convolutional layers
|
||
(which output 4D tensors) to linear layers (which expect 2D).
|
||
|
||
Args:
|
||
x: Input tensor (Tensor or any array-like)
|
||
start_dim: Dimension to start flattening from (default: 1 to preserve batch)
|
||
|
||
Returns:
|
||
Flattened tensor preserving batch dimension
|
||
|
||
Examples:
|
||
# Flatten CNN output for Linear layer
|
||
conv_output = Tensor(np.random.randn(32, 64, 8, 8)) # (batch, channels, height, width)
|
||
flat = flatten(conv_output) # (32, 4096) - ready for Linear layer!
|
||
|
||
# Flatten image for MLP
|
||
images = Tensor(np.random.randn(32, 3, 28, 28)) # CIFAR-10 batch
|
||
flat = flatten(images) # (32, 2352) - ready for MLP!
|
||
"""
|
||
# Get the data (handle both Tensor and numpy arrays)
|
||
if hasattr(x, 'data'):
|
||
data = x.data
|
||
else:
|
||
data = x
|
||
|
||
# Calculate new shape
|
||
batch_size = data.shape[0] if start_dim > 0 else 1
|
||
remaining_size = np.prod(data.shape[start_dim:])
|
||
new_shape = (batch_size, remaining_size) if start_dim > 0 else (remaining_size,)
|
||
|
||
# Reshape while preserving the original tensor type
|
||
if hasattr(x, 'data'):
|
||
# It's a Tensor - create a new Tensor with flattened data
|
||
flattened_data = data.reshape(new_shape)
|
||
# Use type(x) to preserve the exact Tensor type (Parameter vs regular Tensor)
|
||
# This ensures that if input was a Parameter, output is also a Parameter
|
||
return type(x)(flattened_data)
|
||
else:
|
||
# It's a numpy array - just reshape and return
|
||
return data.reshape(new_shape)
|
||
|
||
#| export
|
||
class Flatten(Module):
|
||
"""
|
||
Flatten layer that reshapes tensors from multi-dimensional to 2D.
|
||
|
||
Essential for connecting convolutional layers (which output 4D tensors)
|
||
to linear layers (which expect 2D tensors). Preserves the batch dimension.
|
||
|
||
Example Usage:
|
||
# In a CNN architecture
|
||
model = Sequential([
|
||
Conv2D(3, 16, kernel_size=3), # Output: (batch, 16, height, width)
|
||
ReLU(),
|
||
Flatten(), # Output: (batch, 16*height*width)
|
||
Linear(16*height*width, 10) # Now compatible!
|
||
])
|
||
"""
|
||
|
||
def __init__(self, start_dim=1):
|
||
"""
|
||
Initialize Flatten layer.
|
||
|
||
Args:
|
||
start_dim: Dimension to start flattening from (default: 1 to preserve batch)
|
||
"""
|
||
super().__init__()
|
||
self.start_dim = start_dim
|
||
|
||
def forward(self, x):
|
||
"""
|
||
Flatten tensor starting from start_dim.
|
||
|
||
Args:
|
||
x: Input tensor
|
||
|
||
Returns:
|
||
Flattened tensor with batch dimension preserved
|
||
"""
|
||
return flatten(x, start_dim=self.start_dim)
|
||
|
||
# In[ ]:
|
||
|
||
# TEST Unit Test: Flatten Operations
|
||
def test_unit_flatten():
|
||
"""Test Flatten layer and function implementation."""
|
||
print("TEST Testing Flatten Operations...")
|
||
|
||
# Test case 1: Flatten function with 2D tensor
|
||
x_2d = Tensor([[1, 2], [3, 4]])
|
||
flattened_func = flatten(x_2d)
|
||
assert flattened_func.shape == (2, 2), f"Expected shape (2, 2), got {flattened_func.shape}"
|
||
print("PASS Flatten function with 2D tensor")
|
||
|
||
# Test case 2: Flatten function with 4D tensor (simulating CNN output)
|
||
x_4d = Tensor(np.random.randn(2, 3, 4, 4)) # (batch, channels, height, width)
|
||
flattened_4d = flatten(x_4d)
|
||
assert flattened_4d.shape == (2, 48), f"Expected shape (2, 48), got {flattened_4d.shape}" # 3*4*4 = 48
|
||
print("PASS Flatten function with 4D tensor")
|
||
|
||
# Test case 3: Flatten layer class
|
||
flatten_layer = Flatten()
|
||
layer_output = flatten_layer(x_4d)
|
||
assert layer_output.shape == (2, 48), f"Expected shape (2, 48), got {layer_output.shape}"
|
||
assert np.allclose(layer_output.data, flattened_4d.data), "Flatten layer should match flatten function"
|
||
print("PASS Flatten layer class")
|
||
|
||
# Test case 4: Different start dimensions
|
||
flatten_from_0 = Flatten(start_dim=0)
|
||
full_flat = flatten_from_0(x_2d)
|
||
assert len(full_flat.shape) <= 2, "Flattening from dim 0 should create vector"
|
||
print("PASS Different start dimensions")
|
||
|
||
# Test case 5: Integration with Sequential
|
||
network = Sequential([
|
||
Linear(8, 4),
|
||
Flatten()
|
||
])
|
||
test_input = Tensor(np.random.randn(2, 8))
|
||
output = network(test_input)
|
||
assert output.shape == (2, 4), f"Expected shape (2, 4), got {output.shape}"
|
||
print("PASS Flatten integration with Sequential")
|
||
|
||
print("CELEBRATE All Flatten operations tests passed!")
|
||
|
||
test_unit_flatten()
|
||
|
||
# In[ ]:
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 📦 Where This Code Lives in the Final Package
|
||
|
||
**Learning Side:** You work in modules/03_layers/layers_dev.py
|
||
**Building Side:** Code exports to tinytorch.core.layers
|
||
|
||
```python
|
||
# Final package structure:
|
||
from tinytorch.core.layers import Module, Linear, Sequential, Flatten # This module
|
||
from tinytorch.core.tensor import Tensor, Parameter # Foundation (always needed)
|
||
```
|
||
|
||
**Why this matters:**
|
||
- **Learning:** Complete layer system in one focused module for deep understanding
|
||
- **Production:** Proper organization like PyTorch's torch.nn with all core components together
|
||
- **Consistency:** All layer operations and parameter management in core.layers
|
||
- **Integration:** Works seamlessly with tensors for complete neural network building
|
||
"""
|
||
|
||
# %%
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Complete Neural Network Demo
|
||
"""
|
||
|
||
def demonstrate_complete_networks():
|
||
"""Demonstrate complete neural networks using all implemented components."""
|
||
print("FIRE Complete Neural Network Demo")
|
||
print("=" * 50)
|
||
|
||
print("\n1. MLP for Classification (MNIST-style):")
|
||
# Multi-layer perceptron for image classification
|
||
mlp = Sequential([
|
||
Flatten(), # Flatten input images
|
||
Linear(784, 256), # First hidden layer
|
||
Linear(256, 128), # Second hidden layer
|
||
Linear(128, 10) # Output layer (10 classes)
|
||
])
|
||
|
||
# Test with batch of "images"
|
||
batch_images = Tensor(np.random.randn(32, 28, 28)) # 32 MNIST-like images
|
||
mlp_output = mlp(batch_images)
|
||
print(f" Input: {batch_images.shape} (batch of 28x28 images)")
|
||
print(f" Output: {mlp_output.shape} (class logits for 32 images)")
|
||
print(f" Parameters: {len(mlp.parameters())} tensors")
|
||
|
||
print("\n2. CNN-style Architecture (with Flatten):")
|
||
# Simulate CNN -> Flatten -> Dense pattern
|
||
cnn_style = Sequential([
|
||
# Simulate Conv2D output with random "features"
|
||
Flatten(), # Flatten spatial features
|
||
Linear(512, 256), # Dense layer after convolution
|
||
Linear(256, 10) # Classification head
|
||
])
|
||
|
||
# Test with simulated conv output
|
||
conv_features = Tensor(np.random.randn(16, 8, 8, 8)) # Simulated (B,C,H,W)
|
||
cnn_output = cnn_style(conv_features)
|
||
print(f" Input: {conv_features.shape} (simulated conv features)")
|
||
print(f" Output: {cnn_output.shape} (class predictions)")
|
||
|
||
print("\n3. Deep Network with Many Layers:")
|
||
# Demonstrate deep composition
|
||
deep_net = Sequential()
|
||
layer_sizes = [100, 80, 60, 40, 20, 10]
|
||
|
||
for i in range(len(layer_sizes) - 1):
|
||
deep_net.add(Linear(layer_sizes[i], layer_sizes[i+1]))
|
||
print(f" Added layer: {layer_sizes[i]} -> {layer_sizes[i+1]}")
|
||
|
||
# Test deep network
|
||
deep_input = Tensor(np.random.randn(8, 100))
|
||
deep_output = deep_net(deep_input)
|
||
print(f" Deep network: {deep_input.shape} -> {deep_output.shape}")
|
||
print(f" Total parameters: {len(deep_net.parameters())} tensors")
|
||
|
||
print("\n4. Parameter Management Across Networks:")
|
||
networks = {'MLP': mlp, 'CNN-style': cnn_style, 'Deep': deep_net}
|
||
|
||
for name, net in networks.items():
|
||
params = net.parameters()
|
||
total_params = sum(p.data.size for p in params)
|
||
memory_mb = total_params * 4 / (1024 * 1024) # float32 = 4 bytes
|
||
print(f" {name}: {len(params)} param tensors, {total_params:,} total params, {memory_mb:.2f} MB")
|
||
|
||
print("\nCELEBRATE All components work together seamlessly!")
|
||
print(" • Module system enables automatic parameter collection")
|
||
print(" • Linear layers handle matrix transformations")
|
||
print(" • Sequential composes layers into complete architectures")
|
||
print(" • Flatten connects different layer types")
|
||
print(" • Everything integrates for production-ready neural networks!")
|
||
|
||
demonstrate_complete_networks()
|
||
|
||
# In[ ]:
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Testing Framework
|
||
"""
|
||
|
||
def test_module():
|
||
"""Run complete module validation."""
|
||
print("🧪 TESTING ALL LAYER COMPONENTS")
|
||
print("=" * 40)
|
||
|
||
# Call every individual test function
|
||
test_unit_linear()
|
||
test_unit_parameter_management()
|
||
test_unit_sequential()
|
||
test_unit_flatten()
|
||
|
||
print("\n✅ ALL TESTS PASSED! Layer module ready for integration.")
|
||
|
||
# In[ ]:
|
||
|
||
if __name__ == "__main__":
|
||
print("🚀 TINYTORCH LAYERS MODULE")
|
||
print("=" * 50)
|
||
|
||
# Test all components
|
||
test_module()
|
||
|
||
# Systems analysis
|
||
print("\n" + "=" * 50)
|
||
analyze_layer_performance()
|
||
|
||
# Complete demo
|
||
print("\n" + "=" * 50)
|
||
demonstrate_complete_networks()
|
||
|
||
print("\n🎉 LAYERS MODULE COMPLETE!")
|
||
print("✅ Ready for advanced architectures and training!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🤔 ML Systems Thinking: Interactive Questions
|
||
|
||
Now that you've implemented all the core neural network components, let's think about their implications for ML systems:
|
||
|
||
**Question 1: Memory vs Computation Analysis**
|
||
|
||
You're designing a neural network for deployment on a mobile device with limited memory (1GB RAM) but decent compute power.
|
||
|
||
You have two architecture options:
|
||
A) Wide network: 784 -> 2048 -> 2048 -> 10 (3 layers, wide)
|
||
B) Deep network: 784 -> 256 -> 256 -> 256 -> 256 -> 10 (5 layers, narrow)
|
||
|
||
Calculate the memory requirements for each option and explain which you'd choose for mobile deployment and why.
|
||
|
||
Consider:
|
||
- Parameter storage requirements
|
||
- Intermediate activation storage during forward pass
|
||
- Training vs inference memory requirements
|
||
- How your choice affects model capacity and accuracy
|
||
|
||
⭐ **Question 2: Production Performance Optimization**
|
||
|
||
Your Linear layer implementation works correctly, but you notice it's slower than PyTorch's nn.Linear on the same hardware.
|
||
|
||
Investigate and explain:
|
||
1. Why might our implementation be slower? (Hint: think about underlying linear algebra libraries)
|
||
2. What optimization techniques do production frameworks use?
|
||
3. How would you modify our implementation to approach production performance?
|
||
4. When might our simple implementation actually be preferable?
|
||
|
||
Research areas to consider:
|
||
- BLAS (Basic Linear Algebra Subprograms) libraries
|
||
- Memory layout and cache efficiency
|
||
- Vectorization and SIMD instructions
|
||
- GPU kernel optimization
|
||
|
||
⭐ **Question 3: Systems Architecture Scaling**
|
||
|
||
Modern transformer models like GPT-3 have billions of parameters, primarily in Linear layers.
|
||
|
||
Analyze the scaling challenges:
|
||
1. How does memory requirement scale with model size? Calculate the memory needed for a 175B parameter model.
|
||
2. What are the computational bottlenecks during training vs inference?
|
||
3. How do systems like distributed training address these scaling challenges?
|
||
4. Why do large models use techniques like gradient checkpointing and model parallelism?
|
||
|
||
Systems considerations:
|
||
- Memory hierarchy (L1/L2/L3 cache, RAM, storage)
|
||
- Network bandwidth for distributed training
|
||
- GPU memory constraints and model sharding
|
||
- Inference optimization for production serving
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🎯 MODULE SUMMARY: Layers - Complete Neural Network Foundation
|
||
|
||
### What You've Accomplished
|
||
|
||
You've successfully implemented the complete foundation for neural networks - all the essential components working together:
|
||
|
||
### ✅ **Complete Core System**
|
||
- **Module Base Class**: Parameter management and composition patterns for all neural network components
|
||
- **Matrix Multiplication**: The computational primitive underlying all neural network operations
|
||
- **Linear (Dense) Layers**: Complete implementation with proper parameter initialization and forward propagation
|
||
- **Sequential Networks**: Clean composition system for building complete neural network architectures
|
||
- **Flatten Operations**: Tensor reshaping to connect different layer types (essential for CNN->MLP transitions)
|
||
|
||
### ✅ **Systems Understanding**
|
||
- **Architectural Patterns**: How modular design enables everything from MLPs to complex deep networks
|
||
- **Memory Analysis**: How layer composition affects memory usage and computational efficiency
|
||
- **Performance Characteristics**: Understanding how tensor operations and layer composition affect performance
|
||
- **Production Context**: Connection to real-world ML frameworks and their component organization
|
||
|
||
### ✅ **ML Engineering Skills**
|
||
- **Complete Parameter Management**: How neural networks automatically collect parameters from all components
|
||
- **Network Composition**: Building complex architectures from simple, reusable components
|
||
- **Tensor Operations**: Essential reshaping and transformation operations for different network types
|
||
- **Clean Abstraction**: Professional software design patterns that scale to production systems
|
||
|
||
### 🔗 **Connection to Production ML Systems**
|
||
|
||
Your unified implementation mirrors the complete component systems used in:
|
||
- **PyTorch's nn.Module system**: Same parameter management and composition patterns
|
||
- **PyTorch's nn.Sequential**: Identical architecture composition approach
|
||
- **All major frameworks**: The same modular design principles that power TensorFlow, JAX, and others
|
||
- **Production ML systems**: Clean abstractions that enable complex models while maintaining manageable code
|
||
|
||
### 🚀 **What's Next**
|
||
|
||
With your complete layer foundation, you're ready to:
|
||
- **Module 05 (Dense)**: Build complete dense networks for classification tasks
|
||
- **Module 06 (Spatial)**: Add convolutional layers for computer vision
|
||
- **Module 09 (Autograd)**: Enable automatic differentiation for learning
|
||
- **Module 10 (Optimizers)**: Implement sophisticated optimization algorithms
|
||
|
||
### 💡 **Key Systems Insights**
|
||
|
||
1. **Modular composition is the key to scalable ML systems** - clean interfaces enable complex behaviors
|
||
2. **Parameter management must be automatic** - manual parameter tracking doesn't scale to deep networks
|
||
3. **Tensor operations like flattening are architectural requirements** - different layer types need different tensor shapes
|
||
4. **Clean abstractions enable innovation** - good foundational design supports unlimited architectural experimentation
|
||
|
||
You now understand how to build complete, production-ready neural network foundations that can scale to any architecture!
|
||
""" |