mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-29 04:23:56 -05:00
- Fixed module 03_layers Tensor/Parameter comparison issues - Fixed module 05_autograd psutil dependency (made optional) - Removed duplicate 04_networks module - Created losses.py with MSELoss and CrossEntropyLoss - Created minimal MNIST training examples - All 20 modules now pass individual tests Note: Gradient flow still needs work for full training capability
1050 lines
39 KiB
Python
1050 lines
39 KiB
Python
# ---
|
||
# jupyter:
|
||
# jupytext:
|
||
# text_representation:
|
||
# extension: .py
|
||
# format_name: percent
|
||
# format_version: '1.3'
|
||
# jupytext_version: 1.17.1
|
||
# kernelspec:
|
||
# display_name: Python 3 (ipykernel)
|
||
# language: python
|
||
# name: python3
|
||
# ---
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Networks - Building Intelligence Through Layer Composition
|
||
|
||
Welcome to Networks! You'll learn how to combine individual layers into complete neural networks that can solve complex problems.
|
||
|
||
## 🔗 Building on Previous Learning
|
||
**What You Built Before**:
|
||
- Module 01 (Tensor): Multi-dimensional data structures for inputs and outputs
|
||
- Module 02 (Activations): Nonlinear functions that create intelligence
|
||
- Module 03 (Layers): Linear layers that transform data with learnable parameters
|
||
|
||
**What's Working**: You can transform data with individual layers and activations!
|
||
|
||
**The Gap**: Individual layers solve simple problems - real intelligence emerges when layers compose into networks.
|
||
|
||
**This Module's Solution**: Learn to manually compose layers into multi-layer networks with different architectures.
|
||
|
||
**Connection Map**:
|
||
```
|
||
Layers → Manual Composition → Complete Networks
|
||
(transforms) (architecture) (intelligence)
|
||
```
|
||
|
||
## Learning Objectives
|
||
1. **Manual Network Architecture**: Build networks by composing layers step-by-step
|
||
2. **Parameter Management**: Count and track parameters across multiple layers
|
||
3. **Forward Pass Logic**: Understand data flow through network architectures
|
||
4. **Network Architectures**: Create different network shapes (wide, deep, custom)
|
||
5. **Systems Understanding**: Analyze memory usage and computational complexity
|
||
|
||
## Build → Test → Use
|
||
1. **Build**: Manual network composition functions and parameter counting
|
||
2. **Test**: Validate networks with different architectures and input sizes
|
||
3. **Use**: Apply networks to solve problems requiring multiple transformations
|
||
"""
|
||
|
||
# %%
|
||
# Essential imports for network composition
|
||
import numpy as np
|
||
import sys
|
||
import os
|
||
from typing import List, Tuple, Union, Optional
|
||
|
||
# Import building blocks from previous modules - ONLY use concepts we've learned
|
||
try:
|
||
from tinytorch.core.tensor import Tensor
|
||
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
|
||
from tinytorch.core.layers import Linear, Module
|
||
except ImportError:
|
||
# Development fallback
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))
|
||
from tensor_dev import Tensor
|
||
from activations_dev import ReLU, Sigmoid, Tanh, Softmax
|
||
from layers_dev import Linear, Module
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 1: Understanding Network Architecture
|
||
|
||
### What Makes a Neural Network?
|
||
|
||
A neural network is simply **multiple layers composed together** where each layer's output becomes the next layer's input.
|
||
|
||
```
|
||
Input → Layer1 → Activation → Layer2 → Activation → Output
|
||
(4) (8) (8) (3) (3) (3)
|
||
```
|
||
|
||
**Key Insights**:
|
||
- **Composition**: Networks = layers + activations in sequence
|
||
- **Data Flow**: Output shape of layer N must match input shape of layer N+1
|
||
- **Intelligence**: Nonlinearity from activations enables complex pattern learning
|
||
- **Architecture**: Layer sizes and arrangements determine network capability
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 2: Manual Network Composition
|
||
|
||
Let's start by learning to compose networks manually before automation.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "network-composition", "solution": true}
|
||
def compose_two_layer_network(input_size: int, hidden_size: int, output_size: int,
|
||
activation=ReLU) -> Tuple[Linear, object, Linear]:
|
||
"""
|
||
Create a 2-layer network manually: Input → Linear → Activation → Linear → Output
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
hidden_size: Number of hidden layer neurons
|
||
output_size: Number of output features
|
||
activation: Activation function class (default: ReLU)
|
||
|
||
Returns:
|
||
Tuple of (layer1, activation_instance, layer2)
|
||
|
||
TODO: Create two Linear layers and one activation function
|
||
|
||
APPROACH:
|
||
1. Create first Linear layer: input_size → hidden_size
|
||
2. Create activation function instance
|
||
3. Create second Linear layer: hidden_size → output_size
|
||
4. Return all three components as tuple
|
||
|
||
EXAMPLE:
|
||
>>> layer1, act, layer2 = compose_two_layer_network(4, 8, 3)
|
||
>>> x = Tensor([[1, 2, 3, 4]])
|
||
>>> h = layer1(x) # (1, 4) → (1, 8)
|
||
>>> h_act = act(h) # (1, 8) → (1, 8)
|
||
>>> y = layer2(h_act) # (1, 8) → (1, 3)
|
||
>>> print(y.shape) # (1, 3)
|
||
|
||
HINTS:
|
||
- Use Linear(input_size, hidden_size) for first layer
|
||
- Create activation instance with activation()
|
||
- Use Linear(hidden_size, output_size) for second layer
|
||
- Return as (layer1, activation_instance, layer2)
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Create first layer: input → hidden
|
||
layer1 = Linear(input_size, hidden_size)
|
||
|
||
# Create activation function instance
|
||
activation_instance = activation()
|
||
|
||
# Create second layer: hidden → output
|
||
layer2 = Linear(hidden_size, output_size)
|
||
|
||
return layer1, activation_instance, layer2
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Two-Layer Network Composition
|
||
Test that we can manually compose a simple 2-layer network
|
||
"""
|
||
|
||
# %%
|
||
def test_unit_two_layer_composition():
|
||
"""Test two-layer network composition with different configurations"""
|
||
print("🔬 Unit Test: Two-Layer Network Composition...")
|
||
|
||
# Test 1: Basic composition
|
||
layer1, activation, layer2 = compose_two_layer_network(4, 8, 3)
|
||
|
||
assert isinstance(layer1, Linear), "First component should be Linear layer"
|
||
assert isinstance(activation, ReLU), "Second component should be activation function"
|
||
assert isinstance(layer2, Linear), "Third component should be Linear layer"
|
||
|
||
assert layer1.input_size == 4, "First layer should have correct input size"
|
||
assert layer1.output_size == 8, "First layer should have correct output size"
|
||
assert layer2.input_size == 8, "Second layer should have correct input size"
|
||
assert layer2.output_size == 3, "Second layer should have correct output size"
|
||
|
||
# Test 2: Forward pass compatibility
|
||
x = Tensor(np.random.randn(2, 4))
|
||
h = layer1(x)
|
||
h_activated = activation(h)
|
||
y = layer2(h_activated)
|
||
|
||
assert h.shape == (2, 8), "Hidden layer output should have correct shape"
|
||
assert h_activated.shape == (2, 8), "Activated hidden should preserve shape"
|
||
assert y.shape == (2, 3), "Final output should have correct shape"
|
||
|
||
# Test 3: Different activation functions
|
||
layer1_sig, sig_act, layer2_sig = compose_two_layer_network(3, 5, 2, Sigmoid)
|
||
assert isinstance(sig_act, Sigmoid), "Should create Sigmoid activation when specified"
|
||
|
||
print("✅ Two-layer network composition works correctly!")
|
||
|
||
test_unit_two_layer_composition()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 3: Forward Pass Through Networks
|
||
|
||
Now let's implement the logic for running data through composed networks.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "forward-pass", "solution": true}
|
||
def forward_pass_two_layer(x: Tensor, layer1: Linear, activation, layer2: Linear) -> Tensor:
|
||
"""
|
||
Execute forward pass through a 2-layer network.
|
||
|
||
Args:
|
||
x: Input tensor
|
||
layer1: First Linear layer
|
||
activation: Activation function
|
||
layer2: Second Linear layer
|
||
|
||
Returns:
|
||
Output tensor after passing through the network
|
||
|
||
TODO: Implement forward pass: x → layer1 → activation → layer2 → output
|
||
|
||
APPROACH:
|
||
1. Pass input through first layer
|
||
2. Apply activation function to result
|
||
3. Pass activated result through second layer
|
||
4. Return final output
|
||
|
||
EXAMPLE:
|
||
>>> x = Tensor([[1, 2, 3, 4]]) # (1, 4)
|
||
>>> y = forward_pass_two_layer(x, layer1, relu, layer2)
|
||
>>> print(y.shape) # (1, output_size)
|
||
|
||
HINTS:
|
||
- Call each component in sequence: layer1(x), activation(h), layer2(h_act)
|
||
- Each output becomes input to next component
|
||
- Return the final result
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Step 1: First layer transformation
|
||
hidden = layer1(x)
|
||
|
||
# Step 2: Apply activation function
|
||
hidden_activated = activation(hidden)
|
||
|
||
# Step 3: Second layer transformation
|
||
output = layer2(hidden_activated)
|
||
|
||
return output
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Forward Pass Through Network
|
||
Test that data flows correctly through our manual network
|
||
"""
|
||
|
||
# %%
|
||
def test_unit_forward_pass():
|
||
"""Test forward pass through manually composed networks"""
|
||
print("🔬 Unit Test: Forward Pass Through Networks...")
|
||
|
||
# Create test network
|
||
layer1, relu_act, layer2 = compose_two_layer_network(5, 10, 3)
|
||
|
||
# Test 1: Single sample
|
||
x_single = Tensor(np.random.randn(1, 5))
|
||
y_single = forward_pass_two_layer(x_single, layer1, relu_act, layer2)
|
||
|
||
assert y_single.shape == (1, 3), "Single sample should produce correct output shape"
|
||
assert hasattr(y_single, 'shape') and hasattr(y_single, 'data'), "Output should be a Tensor-like object"
|
||
|
||
# Test 2: Batch processing
|
||
x_batch = Tensor(np.random.randn(4, 5))
|
||
y_batch = forward_pass_two_layer(x_batch, layer1, relu_act, layer2)
|
||
|
||
assert y_batch.shape == (4, 3), "Batch should produce correct output shape"
|
||
|
||
# Test 3: Different network architectures
|
||
wide_layer1, wide_act, wide_layer2 = compose_two_layer_network(2, 50, 1)
|
||
x_wide = Tensor(np.random.randn(3, 2))
|
||
y_wide = forward_pass_two_layer(x_wide, wide_layer1, wide_act, wide_layer2)
|
||
|
||
assert y_wide.shape == (3, 1), "Wide network should work correctly"
|
||
|
||
print("✅ Forward pass through networks works correctly!")
|
||
|
||
test_unit_forward_pass()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 4: Deep Network Composition
|
||
|
||
Real neural networks often have more than 2 layers. Let's build deep networks manually.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "deep-network", "solution": true}
|
||
def compose_deep_network(layer_sizes: List[int], activation=ReLU) -> List:
|
||
"""
|
||
Create a deep network with arbitrary number of layers.
|
||
|
||
Args:
|
||
layer_sizes: List of layer sizes [input_size, hidden1, hidden2, ..., output_size]
|
||
activation: Activation function class
|
||
|
||
Returns:
|
||
List of network components [layer1, activation1, layer2, activation2, ..., final_layer]
|
||
|
||
TODO: Create alternating Linear layers and activations for each pair of sizes
|
||
|
||
APPROACH:
|
||
1. Iterate through pairs of consecutive sizes in layer_sizes
|
||
2. For each pair, create Linear(size_i, size_i+1) and activation()
|
||
3. Don't add activation after the final layer (output layer typically no activation)
|
||
4. Return list of all components in order
|
||
|
||
EXAMPLE:
|
||
>>> components = compose_deep_network([4, 8, 6, 3])
|
||
>>> # Creates: Linear(4,8), ReLU(), Linear(8,6), ReLU(), Linear(6,3)
|
||
>>> len(components) # 5 components
|
||
|
||
HINTS:
|
||
- Use zip(layer_sizes[:-1], layer_sizes[1:]) to get consecutive pairs
|
||
- Add Linear layer, then activation for each pair (except last layer)
|
||
- Last layer: only add Linear, no activation
|
||
- Return list of all components
|
||
"""
|
||
### BEGIN SOLUTION
|
||
components = []
|
||
|
||
# Process all but the last layer (add Linear + Activation)
|
||
for i in range(len(layer_sizes) - 2):
|
||
input_size = layer_sizes[i]
|
||
output_size = layer_sizes[i + 1]
|
||
|
||
# Add Linear layer
|
||
components.append(Linear(input_size, output_size))
|
||
# Add activation
|
||
components.append(activation())
|
||
|
||
# Add final layer (Linear only, no activation)
|
||
if len(layer_sizes) >= 2:
|
||
final_input = layer_sizes[-2]
|
||
final_output = layer_sizes[-1]
|
||
components.append(Linear(final_input, final_output))
|
||
|
||
return components
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Deep Network Composition
|
||
Test that we can build networks with arbitrary depth
|
||
"""
|
||
|
||
# %%
|
||
def test_unit_deep_network():
|
||
"""Test deep network composition with various architectures"""
|
||
print("🔬 Unit Test: Deep Network Composition...")
|
||
|
||
# Test 1: 3-layer network
|
||
components_3layer = compose_deep_network([4, 8, 6, 3])
|
||
expected_components = 5 # Linear, ReLU, Linear, ReLU, Linear
|
||
|
||
assert len(components_3layer) == expected_components, f"3-layer network should have {expected_components} components"
|
||
|
||
# Verify component types and order
|
||
assert isinstance(components_3layer[0], Linear), "First component should be Linear"
|
||
assert isinstance(components_3layer[1], ReLU), "Second component should be ReLU"
|
||
assert isinstance(components_3layer[2], Linear), "Third component should be Linear"
|
||
assert isinstance(components_3layer[3], ReLU), "Fourth component should be ReLU"
|
||
assert isinstance(components_3layer[4], Linear), "Fifth component should be Linear (final)"
|
||
|
||
# Test 2: Verify layer sizes
|
||
assert components_3layer[0].input_size == 4, "First layer should have correct input size"
|
||
assert components_3layer[0].output_size == 8, "First layer should have correct output size"
|
||
assert components_3layer[2].input_size == 8, "Second layer should have correct input size"
|
||
assert components_3layer[2].output_size == 6, "Second layer should have correct output size"
|
||
assert components_3layer[4].input_size == 6, "Final layer should have correct input size"
|
||
assert components_3layer[4].output_size == 3, "Final layer should have correct output size"
|
||
|
||
# Test 3: Different activation function
|
||
components_sigmoid = compose_deep_network([2, 4, 1], Sigmoid)
|
||
assert isinstance(components_sigmoid[1], Sigmoid), "Should use specified activation function"
|
||
|
||
# Test 4: Single layer (edge case)
|
||
components_single = compose_deep_network([5, 2])
|
||
assert len(components_single) == 1, "Single layer should have 1 component"
|
||
assert isinstance(components_single[0], Linear), "Single component should be Linear layer"
|
||
|
||
print("✅ Deep network composition works correctly!")
|
||
|
||
test_unit_deep_network()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 5: Forward Pass Through Deep Networks
|
||
|
||
Now implement forward pass logic for networks of arbitrary depth.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "deep-forward", "solution": true}
|
||
def forward_pass_deep(x: Tensor, components: List) -> Tensor:
|
||
"""
|
||
Execute forward pass through a deep network with arbitrary components.
|
||
|
||
Args:
|
||
x: Input tensor
|
||
components: List of network components (layers and activations)
|
||
|
||
Returns:
|
||
Output tensor after passing through all components
|
||
|
||
TODO: Apply each component in sequence to transform the input
|
||
|
||
APPROACH:
|
||
1. Start with input tensor
|
||
2. Apply each component in order: x = component(x)
|
||
3. Each component's output becomes next component's input
|
||
4. Return final result
|
||
|
||
EXAMPLE:
|
||
>>> components = [Linear(4,8), ReLU(), Linear(8,3)]
|
||
>>> x = Tensor([[1, 2, 3, 4]])
|
||
>>> y = forward_pass_deep(x, components)
|
||
>>> print(y.shape) # (1, 3)
|
||
|
||
HINTS:
|
||
- Use a for loop: for component in components:
|
||
- Apply each component: x = component(x)
|
||
- Return the final transformed x
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Apply each component in sequence
|
||
current_tensor = x
|
||
for component in components:
|
||
current_tensor = component(current_tensor)
|
||
|
||
return current_tensor
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Deep Forward Pass
|
||
Test forward pass through networks of varying depth
|
||
"""
|
||
|
||
# %%
|
||
def test_unit_deep_forward():
|
||
"""Test forward pass through deep networks"""
|
||
print("🔬 Unit Test: Deep Forward Pass...")
|
||
|
||
# Test 1: 3-layer network
|
||
components = compose_deep_network([5, 10, 8, 3])
|
||
x = Tensor(np.random.randn(2, 5))
|
||
y = forward_pass_deep(x, components)
|
||
|
||
assert y.shape == (2, 3), "Deep network should produce correct output shape"
|
||
assert hasattr(y, 'shape') and hasattr(y, 'data'), "Output should be a Tensor-like object"
|
||
|
||
# Test 2: Very deep network
|
||
deep_components = compose_deep_network([4, 16, 12, 8, 6, 2])
|
||
x_deep = Tensor(np.random.randn(1, 4))
|
||
y_deep = forward_pass_deep(x_deep, deep_components)
|
||
|
||
assert y_deep.shape == (1, 2), "Very deep network should work correctly"
|
||
|
||
# Test 3: Wide network
|
||
wide_components = compose_deep_network([3, 100, 1])
|
||
x_wide = Tensor(np.random.randn(5, 3))
|
||
y_wide = forward_pass_deep(x_wide, wide_components)
|
||
|
||
assert y_wide.shape == (5, 1), "Wide network should work correctly"
|
||
|
||
# Test 4: Single layer
|
||
single_components = compose_deep_network([6, 4])
|
||
x_single = Tensor(np.random.randn(1, 6))
|
||
y_single = forward_pass_deep(x_single, single_components)
|
||
|
||
assert y_single.shape == (1, 4), "Single layer should work correctly"
|
||
|
||
print("✅ Deep forward pass works correctly!")
|
||
|
||
test_unit_deep_forward()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 6: Parameter Counting and Analysis
|
||
|
||
Understanding how many learnable parameters are in a network is crucial for memory management and computational complexity.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "parameter-counting", "solution": true}
|
||
def count_network_parameters(components: List) -> Tuple[int, dict]:
|
||
"""
|
||
Count total parameters in a network and provide detailed breakdown.
|
||
|
||
Args:
|
||
components: List of network components
|
||
|
||
Returns:
|
||
Tuple of (total_parameters, parameter_breakdown)
|
||
|
||
TODO: Count parameters in each Linear layer and provide breakdown
|
||
|
||
APPROACH:
|
||
1. Initialize total counter and breakdown dictionary
|
||
2. Iterate through components looking for Linear layers
|
||
3. For each Linear layer: count weights (input_size × output_size) + biases (output_size)
|
||
4. Store breakdown by layer and return total + breakdown
|
||
|
||
EXAMPLE:
|
||
>>> components = [Linear(4,8), ReLU(), Linear(8,3)]
|
||
>>> total, breakdown = count_network_parameters(components)
|
||
>>> print(total) # (4*8 + 8) + (8*3 + 3) = 32 + 8 + 24 + 3 = 67
|
||
|
||
HINTS:
|
||
- Only Linear layers have parameters (activations have none)
|
||
- For Linear layer: parameters = input_size * output_size + output_size
|
||
- Use isinstance(component, Linear) to identify Linear layers
|
||
- Track breakdown with layer names/indices
|
||
"""
|
||
### BEGIN SOLUTION
|
||
total_params = 0
|
||
breakdown = {}
|
||
|
||
layer_count = 0
|
||
for i, component in enumerate(components):
|
||
if isinstance(component, Linear):
|
||
layer_count += 1
|
||
|
||
# Count weights and biases
|
||
weights = component.input_size * component.output_size
|
||
biases = component.output_size
|
||
layer_params = weights + biases
|
||
|
||
# Add to total
|
||
total_params += layer_params
|
||
|
||
# Add to breakdown
|
||
breakdown[f"Linear_Layer_{layer_count}"] = {
|
||
"weights": weights,
|
||
"biases": biases,
|
||
"total": layer_params,
|
||
"shape": f"({component.input_size}, {component.output_size})"
|
||
}
|
||
|
||
return total_params, breakdown
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Parameter Counting
|
||
Test that we correctly count parameters across network architectures
|
||
"""
|
||
|
||
# %%
|
||
def test_unit_parameter_counting():
|
||
"""Test parameter counting across different network architectures"""
|
||
print("🔬 Unit Test: Parameter Counting...")
|
||
|
||
# Test 1: Simple 2-layer network
|
||
components = compose_deep_network([4, 8, 3])
|
||
total, breakdown = count_network_parameters(components)
|
||
|
||
# Expected: (4*8 + 8) + (8*3 + 3) = 40 + 27 = 67
|
||
expected_total = (4*8 + 8) + (8*3 + 3)
|
||
assert total == expected_total, f"Expected {expected_total} parameters, got {total}"
|
||
|
||
# Verify breakdown structure
|
||
assert "Linear_Layer_1" in breakdown, "Should have first layer in breakdown"
|
||
assert "Linear_Layer_2" in breakdown, "Should have second layer in breakdown"
|
||
assert breakdown["Linear_Layer_1"]["weights"] == 32, "First layer should have 32 weights"
|
||
assert breakdown["Linear_Layer_1"]["biases"] == 8, "First layer should have 8 biases"
|
||
|
||
# Test 2: Single layer
|
||
single_components = compose_deep_network([10, 5])
|
||
single_total, single_breakdown = count_network_parameters(single_components)
|
||
|
||
expected_single = 10*5 + 5 # 55
|
||
assert single_total == expected_single, f"Single layer should have {expected_single} parameters"
|
||
|
||
# Test 3: Deep network
|
||
deep_components = compose_deep_network([3, 6, 4, 2])
|
||
deep_total, deep_breakdown = count_network_parameters(deep_components)
|
||
|
||
# Expected: (3*6+6) + (6*4+4) + (4*2+2) = 24 + 28 + 10 = 62
|
||
expected_deep = (3*6 + 6) + (6*4 + 4) + (4*2 + 2)
|
||
assert deep_total == expected_deep, f"Deep network should have {expected_deep} parameters"
|
||
assert len(deep_breakdown) == 3, "Deep network should have 3 Linear layers in breakdown"
|
||
|
||
# Test 4: Network with activations (shouldn't count activation parameters)
|
||
mixed_components = [Linear(5, 10), ReLU(), Linear(10, 2), Sigmoid()]
|
||
mixed_total, mixed_breakdown = count_network_parameters(mixed_components)
|
||
|
||
expected_mixed = (5*10 + 10) + (10*2 + 2) # 60 + 22 = 82
|
||
assert mixed_total == expected_mixed, "Should only count Linear layer parameters"
|
||
assert len(mixed_breakdown) == 2, "Should only include Linear layers in breakdown"
|
||
|
||
print("✅ Parameter counting works correctly!")
|
||
|
||
test_unit_parameter_counting()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Part 7: Network Architecture Patterns
|
||
|
||
Let's implement common network architecture patterns used in practice.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "network-patterns", "solution": true}
|
||
def create_classifier_network(input_size: int, num_classes: int, hidden_sizes: List[int] = None) -> List:
|
||
"""
|
||
Create a classification network with sigmoid output activation.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
num_classes: Number of output classes
|
||
hidden_sizes: List of hidden layer sizes (optional)
|
||
|
||
Returns:
|
||
List of network components with Sigmoid output for classification
|
||
|
||
TODO: Create network ending with Sigmoid activation for classification
|
||
|
||
APPROACH:
|
||
1. Use provided hidden_sizes or default to [hidden_size] if None
|
||
2. Create base network structure: input → hidden layers → output
|
||
3. Add Sigmoid activation at the end for classification probabilities
|
||
4. Return complete component list
|
||
|
||
EXAMPLE:
|
||
>>> components = create_classifier_network(784, 10, [128, 64])
|
||
>>> # Creates: Linear(784,128), ReLU(), Linear(128,64), ReLU(), Linear(64,10), Sigmoid()
|
||
|
||
HINTS:
|
||
- If hidden_sizes is None, use a reasonable default like [input_size // 2]
|
||
- Build layer_sizes list: [input_size] + hidden_sizes + [num_classes]
|
||
- Use compose_deep_network to create base network
|
||
- Add Sigmoid() activation at the end for classification
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Handle default hidden sizes
|
||
if hidden_sizes is None:
|
||
hidden_sizes = [max(input_size // 2, num_classes * 2)]
|
||
|
||
# Build complete layer sizes
|
||
layer_sizes = [input_size] + hidden_sizes + [num_classes]
|
||
|
||
# Create base network
|
||
components = compose_deep_network(layer_sizes)
|
||
|
||
# Add Sigmoid activation for classification
|
||
components.append(Sigmoid())
|
||
|
||
return components
|
||
### END SOLUTION
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "regression-network", "solution": true}
|
||
def create_regression_network(input_size: int, output_size: int = 1, hidden_sizes: List[int] = None) -> List:
|
||
"""
|
||
Create a regression network with no output activation.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output values (default: 1)
|
||
hidden_sizes: List of hidden layer sizes (optional)
|
||
|
||
Returns:
|
||
List of network components with no output activation for regression
|
||
|
||
TODO: Create network with no output activation for regression
|
||
|
||
APPROACH:
|
||
1. Use provided hidden_sizes or create reasonable default
|
||
2. Build layer_sizes list and create network
|
||
3. Do NOT add output activation (regression predicts raw values)
|
||
4. Return component list
|
||
|
||
EXAMPLE:
|
||
>>> components = create_regression_network(4, 1, [8, 4])
|
||
>>> # Creates: Linear(4,8), ReLU(), Linear(8,4), ReLU(), Linear(4,1)
|
||
>>> # No output activation for regression
|
||
|
||
HINTS:
|
||
- Default hidden_sizes could be [input_size, input_size // 2]
|
||
- Use compose_deep_network directly (it doesn't add output activation)
|
||
- Don't add any activation after the final layer
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Handle default hidden sizes
|
||
if hidden_sizes is None:
|
||
hidden_sizes = [input_size, max(input_size // 2, output_size * 2)]
|
||
|
||
# Build complete layer sizes
|
||
layer_sizes = [input_size] + hidden_sizes + [output_size]
|
||
|
||
# Create network (compose_deep_network doesn't add output activation)
|
||
components = compose_deep_network(layer_sizes)
|
||
|
||
return components
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Network Architecture Patterns
|
||
Test specialized network architectures for different tasks
|
||
"""
|
||
|
||
# %%
|
||
def test_unit_network_patterns():
|
||
"""Test different network architecture patterns"""
|
||
print("🔬 Unit Test: Network Architecture Patterns...")
|
||
|
||
# Test 1: Classification network
|
||
classifier = create_classifier_network(784, 10, [128, 64])
|
||
|
||
# Should end with Sigmoid for classification
|
||
assert isinstance(classifier[-1], Sigmoid), "Classifier should end with Sigmoid"
|
||
|
||
# Test forward pass
|
||
x_class = Tensor(np.random.randn(1, 784))
|
||
y_class = forward_pass_deep(x_class, classifier)
|
||
|
||
assert y_class.shape == (1, 10), "Classifier should output correct number of classes"
|
||
# Note: We can't easily test that output is in [0,1] without more sophisticated sigmoid implementation
|
||
|
||
# Test 2: Regression network
|
||
regressor = create_regression_network(4, 1, [8, 4])
|
||
|
||
# Should NOT end with activation
|
||
assert not isinstance(regressor[-1], (Sigmoid, ReLU, Tanh)), "Regressor should not end with activation"
|
||
assert isinstance(regressor[-1], Linear), "Regressor should end with Linear layer"
|
||
|
||
# Test forward pass
|
||
x_reg = Tensor(np.random.randn(3, 4))
|
||
y_reg = forward_pass_deep(x_reg, regressor)
|
||
|
||
assert y_reg.shape == (3, 1), "Regressor should output correct shape"
|
||
|
||
# Test 3: Multi-output regression
|
||
multi_regressor = create_regression_network(6, 3, [10, 5])
|
||
x_multi = Tensor(np.random.randn(2, 6))
|
||
y_multi = forward_pass_deep(x_multi, multi_regressor)
|
||
|
||
assert y_multi.shape == (2, 3), "Multi-output regressor should work"
|
||
|
||
# Test 4: Default hidden sizes
|
||
default_classifier = create_classifier_network(20, 5) # No hidden_sizes specified
|
||
x_default = Tensor(np.random.randn(1, 20))
|
||
y_default = forward_pass_deep(x_default, default_classifier)
|
||
|
||
assert y_default.shape == (1, 5), "Default classifier should work"
|
||
|
||
print("✅ Network architecture patterns work correctly!")
|
||
|
||
test_unit_network_patterns()
|
||
|
||
# %%
|
||
def test_module():
|
||
"""Run all module tests to verify complete implementation"""
|
||
print("🧪 Running all Network module tests...")
|
||
|
||
test_unit_two_layer_composition()
|
||
test_unit_forward_pass()
|
||
test_unit_deep_network()
|
||
test_unit_deep_forward()
|
||
test_unit_parameter_counting()
|
||
test_unit_network_patterns()
|
||
|
||
print("✅ All Network module tests passed! Manual network composition complete.")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🔍 Systems Analysis
|
||
|
||
Now that your network implementations are complete and tested, let's analyze their systems behavior:
|
||
|
||
### Performance and Memory Characteristics
|
||
|
||
Understanding how networks scale with size and depth is crucial for building real ML systems.
|
||
"""
|
||
|
||
# %%
|
||
def measure_network_scaling():
|
||
"""
|
||
📊 SYSTEMS MEASUREMENT: Network Scaling Analysis
|
||
|
||
Measure how network complexity affects performance and memory usage.
|
||
"""
|
||
print("📊 NETWORK SCALING MEASUREMENT")
|
||
print("Testing how network depth and width affect computational complexity...")
|
||
|
||
import time
|
||
|
||
# Test different network architectures
|
||
architectures = [
|
||
("Narrow-Deep", [10, 8, 6, 4, 2]),
|
||
("Wide-Shallow", [10, 50, 2]),
|
||
("Balanced", [10, 20, 10, 2]),
|
||
("Very Deep", [10, 8, 6, 5, 4, 3, 2])
|
||
]
|
||
|
||
batch_size = 100
|
||
num_trials = 10
|
||
|
||
for name, layer_sizes in architectures:
|
||
print(f"\n🔧 Testing {name} architecture: {layer_sizes}")
|
||
|
||
# Create network
|
||
components = compose_deep_network(layer_sizes)
|
||
total_params, breakdown = count_network_parameters(components)
|
||
|
||
# Measure forward pass time
|
||
x = Tensor(np.random.randn(batch_size, layer_sizes[0]))
|
||
|
||
times = []
|
||
for _ in range(num_trials):
|
||
start = time.perf_counter()
|
||
y = forward_pass_deep(x, components)
|
||
elapsed = time.perf_counter() - start
|
||
times.append(elapsed)
|
||
|
||
avg_time = np.mean(times) * 1000 # Convert to milliseconds
|
||
|
||
print(f" Parameters: {total_params:,}")
|
||
print(f" Layers: {len([c for c in components if isinstance(c, Linear)])}")
|
||
print(f" Forward pass: {avg_time:.2f}ms (batch={batch_size})")
|
||
print(f" Time per sample: {avg_time/batch_size:.3f}ms")
|
||
|
||
# Memory analysis
|
||
total_weights = sum(layer.weights.data.size for layer in components if isinstance(layer, Linear))
|
||
total_biases = sum(layer.bias.data.size for layer in components if isinstance(layer, Linear))
|
||
memory_mb = (total_weights + total_biases) * 4 / 1024 / 1024 # float32 = 4 bytes
|
||
|
||
print(f" Memory usage: {memory_mb:.2f} MB")
|
||
|
||
print(f"\n💡 SCALING INSIGHTS:")
|
||
print(f" • Depth vs Width: More layers = more sequential computation")
|
||
print(f" • Parameter count dominates memory usage")
|
||
print(f" • Batch processing amortizes per-sample overhead")
|
||
print(f" • Network architecture significantly impacts performance")
|
||
|
||
# Run the measurement
|
||
measure_network_scaling()
|
||
|
||
# %%
|
||
def measure_parameter_scaling():
|
||
"""
|
||
💾 SYSTEMS MEASUREMENT: Parameter Memory Analysis
|
||
|
||
Understand how parameter count scales with network size.
|
||
"""
|
||
print("💾 PARAMETER MEMORY MEASUREMENT")
|
||
print("Analyzing parameter scaling patterns...")
|
||
|
||
# Test parameter scaling with width
|
||
print("\n📏 Width Scaling (2-layer networks):")
|
||
widths = [10, 50, 100, 200, 500]
|
||
|
||
for width in widths:
|
||
components = compose_deep_network([10, width, 5])
|
||
total_params, _ = count_network_parameters(components)
|
||
memory_mb = total_params * 4 / 1024 / 1024
|
||
|
||
print(f" Width {width:3d}: {total_params:,} params, {memory_mb:.2f} MB")
|
||
|
||
# Test parameter scaling with depth
|
||
print("\n📏 Depth Scaling (constant width=20):")
|
||
depths = [2, 4, 6, 8, 10]
|
||
|
||
for depth in depths:
|
||
layer_sizes = [20] * (depth + 1) # depth+1 layer sizes for depth layers
|
||
layer_sizes[-1] = 5 # Output size
|
||
components = compose_deep_network(layer_sizes)
|
||
total_params, _ = count_network_parameters(components)
|
||
memory_mb = total_params * 4 / 1024 / 1024
|
||
|
||
print(f" Depth {depth:2d}: {total_params:,} params, {memory_mb:.2f} MB")
|
||
|
||
print(f"\n💡 PARAMETER INSIGHTS:")
|
||
print(f" • Width scaling: Quadratic growth O(W²) for layer connections")
|
||
print(f" • Depth scaling: Linear growth O(D) for constant width")
|
||
print(f" • First and last layers often dominate parameter count")
|
||
print(f" • Memory grows linearly with parameter count")
|
||
|
||
# Run the measurement
|
||
measure_parameter_scaling()
|
||
|
||
# %%
|
||
def measure_batch_processing():
|
||
"""
|
||
📦 SYSTEMS MEASUREMENT: Batch Processing Efficiency
|
||
|
||
Analyze how batch size affects computational efficiency.
|
||
"""
|
||
print("📦 BATCH PROCESSING MEASUREMENT")
|
||
print("Testing computational efficiency across batch sizes...")
|
||
|
||
import time
|
||
|
||
# Create test network
|
||
components = compose_deep_network([100, 50, 25, 10])
|
||
|
||
batch_sizes = [1, 10, 50, 100, 500, 1000]
|
||
num_trials = 5
|
||
|
||
print("\nBatch Size | Total Time | Time/Sample | Throughput")
|
||
print("-" * 55)
|
||
|
||
for batch_size in batch_sizes:
|
||
x = Tensor(np.random.randn(batch_size, 100))
|
||
|
||
times = []
|
||
for _ in range(num_trials):
|
||
start = time.perf_counter()
|
||
y = forward_pass_deep(x, components)
|
||
elapsed = time.perf_counter() - start
|
||
times.append(elapsed)
|
||
|
||
avg_time = np.mean(times) * 1000 # milliseconds
|
||
time_per_sample = avg_time / batch_size
|
||
throughput = 1000 / time_per_sample # samples per second
|
||
|
||
print(f"{batch_size:9d} | {avg_time:9.2f}ms | {time_per_sample:10.3f}ms | {throughput:8.0f} samples/s")
|
||
|
||
print(f"\n💡 BATCH PROCESSING INSIGHTS:")
|
||
print(f" • Larger batches amortize per-batch overhead")
|
||
print(f" • Time per sample decreases with batch size")
|
||
print(f" • Throughput increases significantly with batching")
|
||
print(f" • Memory usage scales linearly with batch size")
|
||
|
||
# Run the measurement
|
||
measure_batch_processing()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🤔 ML Systems Thinking: Interactive Questions
|
||
|
||
Now that you've implemented manual network composition, let's connect this to broader ML systems principles:
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### Question 1: Memory and Performance Analysis
|
||
|
||
In your `count_network_parameters()` function, you discovered that a 3-layer network with sizes [784, 128, 64, 10] has about 109,000 parameters.
|
||
|
||
When you tested this network with different batch sizes, you saw that processing time per sample decreased with larger batches. Analyze the memory and computational trade-offs:
|
||
|
||
**Your Implementation Analysis:**
|
||
- How does the parameter memory (109K parameters × 4 bytes = ~436KB) compare to activation memory for different batch sizes?
|
||
- Why does your `forward_pass_deep()` function become more efficient per sample with larger batches?
|
||
- At what batch size would activation memory exceed parameter memory for this network?
|
||
|
||
**Systems Engineering Question:**
|
||
If you needed to deploy this network on a device with only 1MB of available memory, what modifications to your network composition functions would you implement to stay within memory constraints while maintaining reasonable accuracy?
|
||
|
||
Think about: Parameter sharing strategies, layer width reduction, depth vs width trade-offs
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### Question 2: Architecture Scaling Analysis
|
||
|
||
Your `compose_deep_network()` function can create networks of arbitrary depth and width. You measured that very deep networks (10+ layers) have linear parameter growth but may suffer from other issues.
|
||
|
||
**Implementation Scaling Analysis:**
|
||
- In your deep network experiments, which architecture pattern (narrow-deep vs wide-shallow) was more computationally efficient?
|
||
- How would you modify your `forward_pass_deep()` function to handle networks with 100+ layers efficiently?
|
||
- What bottlenecks would emerge in your current manual composition approach for very large networks?
|
||
|
||
**Production Engineering Question:**
|
||
Design a modification to your current network composition system that could handle production-scale networks (1000+ layers, millions of parameters) while maintaining the educational clarity of manual composition.
|
||
|
||
Think about: Memory checkpointing, activation recomputation, gradient accumulation patterns
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### Question 3: Integration and Modularity Analysis
|
||
|
||
Your manual network composition approach gives you complete control over layer ordering and activation placement. However, you've seen that composing networks manually becomes complex for large architectures.
|
||
|
||
**Integration Analysis:**
|
||
- How would you extend your current `create_classifier_network()` and `create_regression_network()` functions to support more complex architectures like residual connections?
|
||
- What interface changes to your component system would be needed to handle branching network topologies?
|
||
- How does manual composition compare to automated composition in terms of debugging and understanding?
|
||
|
||
**Systems Architecture Question:**
|
||
Design a hybrid approach that maintains the educational benefits of your manual composition while providing the convenience of automated network building for complex architectures. What abstractions would you introduce?
|
||
|
||
Think about: Component interfaces, graph representations, debugging visibility
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🎯 MODULE SUMMARY: Networks - Manual Composition Mastery
|
||
|
||
Congratulations! You've successfully implemented manual network composition that forms the foundation of all neural network architectures:
|
||
|
||
### What You've Accomplished
|
||
✅ **Manual Network Composition**: Built 150+ lines of network architecture code with step-by-step layer composition
|
||
✅ **Forward Pass Logic**: Implemented data flow through networks of arbitrary depth and complexity
|
||
✅ **Parameter Analysis**: Created comprehensive parameter counting and memory analysis systems
|
||
✅ **Architecture Patterns**: Built specialized networks for classification, regression, and custom tasks
|
||
✅ **Systems Understanding**: Analyzed scaling behavior, memory usage, and computational complexity
|
||
|
||
### Key Learning Outcomes
|
||
- **Network Architecture**: Understanding how layers compose into intelligent systems through manual control
|
||
- **Data Flow Principles**: Mastery of tensor shape transformations through network layers
|
||
- **Parameter Management**: Deep insight into memory requirements and computational complexity
|
||
- **Performance Characteristics**: Knowledge of how network depth and width affect efficiency
|
||
|
||
### Mathematical Foundations Mastered
|
||
- **Composition Functions**: f(g(h(x))) = network(x) through sequential application
|
||
- **Parameter Scaling**: O(input_size × output_size) per layer, O(depth) for network
|
||
- **Memory Complexity**: Linear scaling with parameters plus O(batch_size × max_layer_width) for activations
|
||
|
||
### Professional Skills Developed
|
||
- **Manual Architecture Design**: Building networks layer-by-layer with complete understanding
|
||
- **Systems Analysis**: Measuring and optimizing network performance characteristics
|
||
- **Memory Engineering**: Understanding parameter vs activation memory trade-offs
|
||
- **Performance Optimization**: Batch processing and computational efficiency analysis
|
||
|
||
### Ready for Advanced Applications
|
||
Your manual network composition now enables:
|
||
- **Custom Architectures**: Build any network topology with complete understanding
|
||
- **Performance Analysis**: Measure and optimize network computational characteristics
|
||
- **Memory Management**: Predict and control network memory requirements
|
||
- **Educational Foundation**: Deep understanding before automated composition tools
|
||
|
||
### Connection to Real ML Systems
|
||
Your implementation mirrors production patterns:
|
||
- **PyTorch**: Your manual composition matches nn.Sequential() internal behavior
|
||
- **TensorFlow**: Similar to tf.keras.Sequential() layer-by-layer construction
|
||
- **Industry Standard**: Manual composition used for custom architectures and research
|
||
|
||
### Next Steps
|
||
1. **Export your module**: `tito module complete 04_networks`
|
||
2. **Validate integration**: `tito test --module networks`
|
||
3. **Explore automated composition**: Your foundation enables understanding Sequential in Module 05
|
||
4. **Ready for Module 05**: Linear Networks with automated composition tools
|
||
|
||
**🚀 Achievement Unlocked**: Your manual network composition mastery provides the deep understanding needed for building automated ML frameworks. You've learned to think like a neural network architect!
|
||
"""
|
||
|
||
# %%
|
||
if __name__ == "__main__":
|
||
# Run all tests to validate complete implementation
|
||
test_module()
|
||
|
||
# Display completion message
|
||
print("\n" + "="*60)
|
||
print("🎯 MODULE 04 (NETWORKS) COMPLETE!")
|
||
print("📈 Progress: Manual Network Composition ✓")
|
||
print("🔥 Next up: Module 05 - Automated Linear Networks!")
|
||
print("💪 You're building real ML architecture understanding!")
|
||
print("="*60) |