Files
TinyTorch/modules/04_networks_backup/networks_dev.py
Vijay Janapa Reddi e8e6657b51 Fix module issues and create minimal MNIST training examples
- Fixed module 03_layers Tensor/Parameter comparison issues
- Fixed module 05_autograd psutil dependency (made optional)
- Removed duplicate 04_networks module
- Created losses.py with MSELoss and CrossEntropyLoss
- Created minimal MNIST training examples
- All 20 modules now pass individual tests

Note: Gradient flow still needs work for full training capability
2025-09-29 10:20:33 -04:00

1050 lines
39 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---
# %% [markdown]
"""
# Networks - Building Intelligence Through Layer Composition
Welcome to Networks! You'll learn how to combine individual layers into complete neural networks that can solve complex problems.
## 🔗 Building on Previous Learning
**What You Built Before**:
- Module 01 (Tensor): Multi-dimensional data structures for inputs and outputs
- Module 02 (Activations): Nonlinear functions that create intelligence
- Module 03 (Layers): Linear layers that transform data with learnable parameters
**What's Working**: You can transform data with individual layers and activations!
**The Gap**: Individual layers solve simple problems - real intelligence emerges when layers compose into networks.
**This Module's Solution**: Learn to manually compose layers into multi-layer networks with different architectures.
**Connection Map**:
```
Layers → Manual Composition → Complete Networks
(transforms) (architecture) (intelligence)
```
## Learning Objectives
1. **Manual Network Architecture**: Build networks by composing layers step-by-step
2. **Parameter Management**: Count and track parameters across multiple layers
3. **Forward Pass Logic**: Understand data flow through network architectures
4. **Network Architectures**: Create different network shapes (wide, deep, custom)
5. **Systems Understanding**: Analyze memory usage and computational complexity
## Build → Test → Use
1. **Build**: Manual network composition functions and parameter counting
2. **Test**: Validate networks with different architectures and input sizes
3. **Use**: Apply networks to solve problems requiring multiple transformations
"""
# %%
# Essential imports for network composition
import numpy as np
import sys
import os
from typing import List, Tuple, Union, Optional
# Import building blocks from previous modules - ONLY use concepts we've learned
try:
from tinytorch.core.tensor import Tensor
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
from tinytorch.core.layers import Linear, Module
except ImportError:
# Development fallback
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))
from tensor_dev import Tensor
from activations_dev import ReLU, Sigmoid, Tanh, Softmax
from layers_dev import Linear, Module
# %% [markdown]
"""
## Part 1: Understanding Network Architecture
### What Makes a Neural Network?
A neural network is simply **multiple layers composed together** where each layer's output becomes the next layer's input.
```
Input → Layer1 → Activation → Layer2 → Activation → Output
(4) (8) (8) (3) (3) (3)
```
**Key Insights**:
- **Composition**: Networks = layers + activations in sequence
- **Data Flow**: Output shape of layer N must match input shape of layer N+1
- **Intelligence**: Nonlinearity from activations enables complex pattern learning
- **Architecture**: Layer sizes and arrangements determine network capability
"""
# %% [markdown]
"""
## Part 2: Manual Network Composition
Let's start by learning to compose networks manually before automation.
"""
# %% nbgrader={"grade": false, "grade_id": "network-composition", "solution": true}
def compose_two_layer_network(input_size: int, hidden_size: int, output_size: int,
activation=ReLU) -> Tuple[Linear, object, Linear]:
"""
Create a 2-layer network manually: Input → Linear → Activation → Linear → Output
Args:
input_size: Number of input features
hidden_size: Number of hidden layer neurons
output_size: Number of output features
activation: Activation function class (default: ReLU)
Returns:
Tuple of (layer1, activation_instance, layer2)
TODO: Create two Linear layers and one activation function
APPROACH:
1. Create first Linear layer: input_size → hidden_size
2. Create activation function instance
3. Create second Linear layer: hidden_size → output_size
4. Return all three components as tuple
EXAMPLE:
>>> layer1, act, layer2 = compose_two_layer_network(4, 8, 3)
>>> x = Tensor([[1, 2, 3, 4]])
>>> h = layer1(x) # (1, 4) → (1, 8)
>>> h_act = act(h) # (1, 8) → (1, 8)
>>> y = layer2(h_act) # (1, 8) → (1, 3)
>>> print(y.shape) # (1, 3)
HINTS:
- Use Linear(input_size, hidden_size) for first layer
- Create activation instance with activation()
- Use Linear(hidden_size, output_size) for second layer
- Return as (layer1, activation_instance, layer2)
"""
### BEGIN SOLUTION
# Create first layer: input → hidden
layer1 = Linear(input_size, hidden_size)
# Create activation function instance
activation_instance = activation()
# Create second layer: hidden → output
layer2 = Linear(hidden_size, output_size)
return layer1, activation_instance, layer2
### END SOLUTION
# %% [markdown]
"""
### 🧪 Unit Test: Two-Layer Network Composition
Test that we can manually compose a simple 2-layer network
"""
# %%
def test_unit_two_layer_composition():
"""Test two-layer network composition with different configurations"""
print("🔬 Unit Test: Two-Layer Network Composition...")
# Test 1: Basic composition
layer1, activation, layer2 = compose_two_layer_network(4, 8, 3)
assert isinstance(layer1, Linear), "First component should be Linear layer"
assert isinstance(activation, ReLU), "Second component should be activation function"
assert isinstance(layer2, Linear), "Third component should be Linear layer"
assert layer1.input_size == 4, "First layer should have correct input size"
assert layer1.output_size == 8, "First layer should have correct output size"
assert layer2.input_size == 8, "Second layer should have correct input size"
assert layer2.output_size == 3, "Second layer should have correct output size"
# Test 2: Forward pass compatibility
x = Tensor(np.random.randn(2, 4))
h = layer1(x)
h_activated = activation(h)
y = layer2(h_activated)
assert h.shape == (2, 8), "Hidden layer output should have correct shape"
assert h_activated.shape == (2, 8), "Activated hidden should preserve shape"
assert y.shape == (2, 3), "Final output should have correct shape"
# Test 3: Different activation functions
layer1_sig, sig_act, layer2_sig = compose_two_layer_network(3, 5, 2, Sigmoid)
assert isinstance(sig_act, Sigmoid), "Should create Sigmoid activation when specified"
print("✅ Two-layer network composition works correctly!")
test_unit_two_layer_composition()
# %% [markdown]
"""
## Part 3: Forward Pass Through Networks
Now let's implement the logic for running data through composed networks.
"""
# %% nbgrader={"grade": false, "grade_id": "forward-pass", "solution": true}
def forward_pass_two_layer(x: Tensor, layer1: Linear, activation, layer2: Linear) -> Tensor:
"""
Execute forward pass through a 2-layer network.
Args:
x: Input tensor
layer1: First Linear layer
activation: Activation function
layer2: Second Linear layer
Returns:
Output tensor after passing through the network
TODO: Implement forward pass: x → layer1 → activation → layer2 → output
APPROACH:
1. Pass input through first layer
2. Apply activation function to result
3. Pass activated result through second layer
4. Return final output
EXAMPLE:
>>> x = Tensor([[1, 2, 3, 4]]) # (1, 4)
>>> y = forward_pass_two_layer(x, layer1, relu, layer2)
>>> print(y.shape) # (1, output_size)
HINTS:
- Call each component in sequence: layer1(x), activation(h), layer2(h_act)
- Each output becomes input to next component
- Return the final result
"""
### BEGIN SOLUTION
# Step 1: First layer transformation
hidden = layer1(x)
# Step 2: Apply activation function
hidden_activated = activation(hidden)
# Step 3: Second layer transformation
output = layer2(hidden_activated)
return output
### END SOLUTION
# %% [markdown]
"""
### 🧪 Unit Test: Forward Pass Through Network
Test that data flows correctly through our manual network
"""
# %%
def test_unit_forward_pass():
"""Test forward pass through manually composed networks"""
print("🔬 Unit Test: Forward Pass Through Networks...")
# Create test network
layer1, relu_act, layer2 = compose_two_layer_network(5, 10, 3)
# Test 1: Single sample
x_single = Tensor(np.random.randn(1, 5))
y_single = forward_pass_two_layer(x_single, layer1, relu_act, layer2)
assert y_single.shape == (1, 3), "Single sample should produce correct output shape"
assert hasattr(y_single, 'shape') and hasattr(y_single, 'data'), "Output should be a Tensor-like object"
# Test 2: Batch processing
x_batch = Tensor(np.random.randn(4, 5))
y_batch = forward_pass_two_layer(x_batch, layer1, relu_act, layer2)
assert y_batch.shape == (4, 3), "Batch should produce correct output shape"
# Test 3: Different network architectures
wide_layer1, wide_act, wide_layer2 = compose_two_layer_network(2, 50, 1)
x_wide = Tensor(np.random.randn(3, 2))
y_wide = forward_pass_two_layer(x_wide, wide_layer1, wide_act, wide_layer2)
assert y_wide.shape == (3, 1), "Wide network should work correctly"
print("✅ Forward pass through networks works correctly!")
test_unit_forward_pass()
# %% [markdown]
"""
## Part 4: Deep Network Composition
Real neural networks often have more than 2 layers. Let's build deep networks manually.
"""
# %% nbgrader={"grade": false, "grade_id": "deep-network", "solution": true}
def compose_deep_network(layer_sizes: List[int], activation=ReLU) -> List:
"""
Create a deep network with arbitrary number of layers.
Args:
layer_sizes: List of layer sizes [input_size, hidden1, hidden2, ..., output_size]
activation: Activation function class
Returns:
List of network components [layer1, activation1, layer2, activation2, ..., final_layer]
TODO: Create alternating Linear layers and activations for each pair of sizes
APPROACH:
1. Iterate through pairs of consecutive sizes in layer_sizes
2. For each pair, create Linear(size_i, size_i+1) and activation()
3. Don't add activation after the final layer (output layer typically no activation)
4. Return list of all components in order
EXAMPLE:
>>> components = compose_deep_network([4, 8, 6, 3])
>>> # Creates: Linear(4,8), ReLU(), Linear(8,6), ReLU(), Linear(6,3)
>>> len(components) # 5 components
HINTS:
- Use zip(layer_sizes[:-1], layer_sizes[1:]) to get consecutive pairs
- Add Linear layer, then activation for each pair (except last layer)
- Last layer: only add Linear, no activation
- Return list of all components
"""
### BEGIN SOLUTION
components = []
# Process all but the last layer (add Linear + Activation)
for i in range(len(layer_sizes) - 2):
input_size = layer_sizes[i]
output_size = layer_sizes[i + 1]
# Add Linear layer
components.append(Linear(input_size, output_size))
# Add activation
components.append(activation())
# Add final layer (Linear only, no activation)
if len(layer_sizes) >= 2:
final_input = layer_sizes[-2]
final_output = layer_sizes[-1]
components.append(Linear(final_input, final_output))
return components
### END SOLUTION
# %% [markdown]
"""
### 🧪 Unit Test: Deep Network Composition
Test that we can build networks with arbitrary depth
"""
# %%
def test_unit_deep_network():
"""Test deep network composition with various architectures"""
print("🔬 Unit Test: Deep Network Composition...")
# Test 1: 3-layer network
components_3layer = compose_deep_network([4, 8, 6, 3])
expected_components = 5 # Linear, ReLU, Linear, ReLU, Linear
assert len(components_3layer) == expected_components, f"3-layer network should have {expected_components} components"
# Verify component types and order
assert isinstance(components_3layer[0], Linear), "First component should be Linear"
assert isinstance(components_3layer[1], ReLU), "Second component should be ReLU"
assert isinstance(components_3layer[2], Linear), "Third component should be Linear"
assert isinstance(components_3layer[3], ReLU), "Fourth component should be ReLU"
assert isinstance(components_3layer[4], Linear), "Fifth component should be Linear (final)"
# Test 2: Verify layer sizes
assert components_3layer[0].input_size == 4, "First layer should have correct input size"
assert components_3layer[0].output_size == 8, "First layer should have correct output size"
assert components_3layer[2].input_size == 8, "Second layer should have correct input size"
assert components_3layer[2].output_size == 6, "Second layer should have correct output size"
assert components_3layer[4].input_size == 6, "Final layer should have correct input size"
assert components_3layer[4].output_size == 3, "Final layer should have correct output size"
# Test 3: Different activation function
components_sigmoid = compose_deep_network([2, 4, 1], Sigmoid)
assert isinstance(components_sigmoid[1], Sigmoid), "Should use specified activation function"
# Test 4: Single layer (edge case)
components_single = compose_deep_network([5, 2])
assert len(components_single) == 1, "Single layer should have 1 component"
assert isinstance(components_single[0], Linear), "Single component should be Linear layer"
print("✅ Deep network composition works correctly!")
test_unit_deep_network()
# %% [markdown]
"""
## Part 5: Forward Pass Through Deep Networks
Now implement forward pass logic for networks of arbitrary depth.
"""
# %% nbgrader={"grade": false, "grade_id": "deep-forward", "solution": true}
def forward_pass_deep(x: Tensor, components: List) -> Tensor:
"""
Execute forward pass through a deep network with arbitrary components.
Args:
x: Input tensor
components: List of network components (layers and activations)
Returns:
Output tensor after passing through all components
TODO: Apply each component in sequence to transform the input
APPROACH:
1. Start with input tensor
2. Apply each component in order: x = component(x)
3. Each component's output becomes next component's input
4. Return final result
EXAMPLE:
>>> components = [Linear(4,8), ReLU(), Linear(8,3)]
>>> x = Tensor([[1, 2, 3, 4]])
>>> y = forward_pass_deep(x, components)
>>> print(y.shape) # (1, 3)
HINTS:
- Use a for loop: for component in components:
- Apply each component: x = component(x)
- Return the final transformed x
"""
### BEGIN SOLUTION
# Apply each component in sequence
current_tensor = x
for component in components:
current_tensor = component(current_tensor)
return current_tensor
### END SOLUTION
# %% [markdown]
"""
### 🧪 Unit Test: Deep Forward Pass
Test forward pass through networks of varying depth
"""
# %%
def test_unit_deep_forward():
"""Test forward pass through deep networks"""
print("🔬 Unit Test: Deep Forward Pass...")
# Test 1: 3-layer network
components = compose_deep_network([5, 10, 8, 3])
x = Tensor(np.random.randn(2, 5))
y = forward_pass_deep(x, components)
assert y.shape == (2, 3), "Deep network should produce correct output shape"
assert hasattr(y, 'shape') and hasattr(y, 'data'), "Output should be a Tensor-like object"
# Test 2: Very deep network
deep_components = compose_deep_network([4, 16, 12, 8, 6, 2])
x_deep = Tensor(np.random.randn(1, 4))
y_deep = forward_pass_deep(x_deep, deep_components)
assert y_deep.shape == (1, 2), "Very deep network should work correctly"
# Test 3: Wide network
wide_components = compose_deep_network([3, 100, 1])
x_wide = Tensor(np.random.randn(5, 3))
y_wide = forward_pass_deep(x_wide, wide_components)
assert y_wide.shape == (5, 1), "Wide network should work correctly"
# Test 4: Single layer
single_components = compose_deep_network([6, 4])
x_single = Tensor(np.random.randn(1, 6))
y_single = forward_pass_deep(x_single, single_components)
assert y_single.shape == (1, 4), "Single layer should work correctly"
print("✅ Deep forward pass works correctly!")
test_unit_deep_forward()
# %% [markdown]
"""
## Part 6: Parameter Counting and Analysis
Understanding how many learnable parameters are in a network is crucial for memory management and computational complexity.
"""
# %% nbgrader={"grade": false, "grade_id": "parameter-counting", "solution": true}
def count_network_parameters(components: List) -> Tuple[int, dict]:
"""
Count total parameters in a network and provide detailed breakdown.
Args:
components: List of network components
Returns:
Tuple of (total_parameters, parameter_breakdown)
TODO: Count parameters in each Linear layer and provide breakdown
APPROACH:
1. Initialize total counter and breakdown dictionary
2. Iterate through components looking for Linear layers
3. For each Linear layer: count weights (input_size × output_size) + biases (output_size)
4. Store breakdown by layer and return total + breakdown
EXAMPLE:
>>> components = [Linear(4,8), ReLU(), Linear(8,3)]
>>> total, breakdown = count_network_parameters(components)
>>> print(total) # (4*8 + 8) + (8*3 + 3) = 32 + 8 + 24 + 3 = 67
HINTS:
- Only Linear layers have parameters (activations have none)
- For Linear layer: parameters = input_size * output_size + output_size
- Use isinstance(component, Linear) to identify Linear layers
- Track breakdown with layer names/indices
"""
### BEGIN SOLUTION
total_params = 0
breakdown = {}
layer_count = 0
for i, component in enumerate(components):
if isinstance(component, Linear):
layer_count += 1
# Count weights and biases
weights = component.input_size * component.output_size
biases = component.output_size
layer_params = weights + biases
# Add to total
total_params += layer_params
# Add to breakdown
breakdown[f"Linear_Layer_{layer_count}"] = {
"weights": weights,
"biases": biases,
"total": layer_params,
"shape": f"({component.input_size}, {component.output_size})"
}
return total_params, breakdown
### END SOLUTION
# %% [markdown]
"""
### 🧪 Unit Test: Parameter Counting
Test that we correctly count parameters across network architectures
"""
# %%
def test_unit_parameter_counting():
"""Test parameter counting across different network architectures"""
print("🔬 Unit Test: Parameter Counting...")
# Test 1: Simple 2-layer network
components = compose_deep_network([4, 8, 3])
total, breakdown = count_network_parameters(components)
# Expected: (4*8 + 8) + (8*3 + 3) = 40 + 27 = 67
expected_total = (4*8 + 8) + (8*3 + 3)
assert total == expected_total, f"Expected {expected_total} parameters, got {total}"
# Verify breakdown structure
assert "Linear_Layer_1" in breakdown, "Should have first layer in breakdown"
assert "Linear_Layer_2" in breakdown, "Should have second layer in breakdown"
assert breakdown["Linear_Layer_1"]["weights"] == 32, "First layer should have 32 weights"
assert breakdown["Linear_Layer_1"]["biases"] == 8, "First layer should have 8 biases"
# Test 2: Single layer
single_components = compose_deep_network([10, 5])
single_total, single_breakdown = count_network_parameters(single_components)
expected_single = 10*5 + 5 # 55
assert single_total == expected_single, f"Single layer should have {expected_single} parameters"
# Test 3: Deep network
deep_components = compose_deep_network([3, 6, 4, 2])
deep_total, deep_breakdown = count_network_parameters(deep_components)
# Expected: (3*6+6) + (6*4+4) + (4*2+2) = 24 + 28 + 10 = 62
expected_deep = (3*6 + 6) + (6*4 + 4) + (4*2 + 2)
assert deep_total == expected_deep, f"Deep network should have {expected_deep} parameters"
assert len(deep_breakdown) == 3, "Deep network should have 3 Linear layers in breakdown"
# Test 4: Network with activations (shouldn't count activation parameters)
mixed_components = [Linear(5, 10), ReLU(), Linear(10, 2), Sigmoid()]
mixed_total, mixed_breakdown = count_network_parameters(mixed_components)
expected_mixed = (5*10 + 10) + (10*2 + 2) # 60 + 22 = 82
assert mixed_total == expected_mixed, "Should only count Linear layer parameters"
assert len(mixed_breakdown) == 2, "Should only include Linear layers in breakdown"
print("✅ Parameter counting works correctly!")
test_unit_parameter_counting()
# %% [markdown]
"""
## Part 7: Network Architecture Patterns
Let's implement common network architecture patterns used in practice.
"""
# %% nbgrader={"grade": false, "grade_id": "network-patterns", "solution": true}
def create_classifier_network(input_size: int, num_classes: int, hidden_sizes: List[int] = None) -> List:
"""
Create a classification network with sigmoid output activation.
Args:
input_size: Number of input features
num_classes: Number of output classes
hidden_sizes: List of hidden layer sizes (optional)
Returns:
List of network components with Sigmoid output for classification
TODO: Create network ending with Sigmoid activation for classification
APPROACH:
1. Use provided hidden_sizes or default to [hidden_size] if None
2. Create base network structure: input → hidden layers → output
3. Add Sigmoid activation at the end for classification probabilities
4. Return complete component list
EXAMPLE:
>>> components = create_classifier_network(784, 10, [128, 64])
>>> # Creates: Linear(784,128), ReLU(), Linear(128,64), ReLU(), Linear(64,10), Sigmoid()
HINTS:
- If hidden_sizes is None, use a reasonable default like [input_size // 2]
- Build layer_sizes list: [input_size] + hidden_sizes + [num_classes]
- Use compose_deep_network to create base network
- Add Sigmoid() activation at the end for classification
"""
### BEGIN SOLUTION
# Handle default hidden sizes
if hidden_sizes is None:
hidden_sizes = [max(input_size // 2, num_classes * 2)]
# Build complete layer sizes
layer_sizes = [input_size] + hidden_sizes + [num_classes]
# Create base network
components = compose_deep_network(layer_sizes)
# Add Sigmoid activation for classification
components.append(Sigmoid())
return components
### END SOLUTION
# %% nbgrader={"grade": false, "grade_id": "regression-network", "solution": true}
def create_regression_network(input_size: int, output_size: int = 1, hidden_sizes: List[int] = None) -> List:
"""
Create a regression network with no output activation.
Args:
input_size: Number of input features
output_size: Number of output values (default: 1)
hidden_sizes: List of hidden layer sizes (optional)
Returns:
List of network components with no output activation for regression
TODO: Create network with no output activation for regression
APPROACH:
1. Use provided hidden_sizes or create reasonable default
2. Build layer_sizes list and create network
3. Do NOT add output activation (regression predicts raw values)
4. Return component list
EXAMPLE:
>>> components = create_regression_network(4, 1, [8, 4])
>>> # Creates: Linear(4,8), ReLU(), Linear(8,4), ReLU(), Linear(4,1)
>>> # No output activation for regression
HINTS:
- Default hidden_sizes could be [input_size, input_size // 2]
- Use compose_deep_network directly (it doesn't add output activation)
- Don't add any activation after the final layer
"""
### BEGIN SOLUTION
# Handle default hidden sizes
if hidden_sizes is None:
hidden_sizes = [input_size, max(input_size // 2, output_size * 2)]
# Build complete layer sizes
layer_sizes = [input_size] + hidden_sizes + [output_size]
# Create network (compose_deep_network doesn't add output activation)
components = compose_deep_network(layer_sizes)
return components
### END SOLUTION
# %% [markdown]
"""
### 🧪 Unit Test: Network Architecture Patterns
Test specialized network architectures for different tasks
"""
# %%
def test_unit_network_patterns():
"""Test different network architecture patterns"""
print("🔬 Unit Test: Network Architecture Patterns...")
# Test 1: Classification network
classifier = create_classifier_network(784, 10, [128, 64])
# Should end with Sigmoid for classification
assert isinstance(classifier[-1], Sigmoid), "Classifier should end with Sigmoid"
# Test forward pass
x_class = Tensor(np.random.randn(1, 784))
y_class = forward_pass_deep(x_class, classifier)
assert y_class.shape == (1, 10), "Classifier should output correct number of classes"
# Note: We can't easily test that output is in [0,1] without more sophisticated sigmoid implementation
# Test 2: Regression network
regressor = create_regression_network(4, 1, [8, 4])
# Should NOT end with activation
assert not isinstance(regressor[-1], (Sigmoid, ReLU, Tanh)), "Regressor should not end with activation"
assert isinstance(regressor[-1], Linear), "Regressor should end with Linear layer"
# Test forward pass
x_reg = Tensor(np.random.randn(3, 4))
y_reg = forward_pass_deep(x_reg, regressor)
assert y_reg.shape == (3, 1), "Regressor should output correct shape"
# Test 3: Multi-output regression
multi_regressor = create_regression_network(6, 3, [10, 5])
x_multi = Tensor(np.random.randn(2, 6))
y_multi = forward_pass_deep(x_multi, multi_regressor)
assert y_multi.shape == (2, 3), "Multi-output regressor should work"
# Test 4: Default hidden sizes
default_classifier = create_classifier_network(20, 5) # No hidden_sizes specified
x_default = Tensor(np.random.randn(1, 20))
y_default = forward_pass_deep(x_default, default_classifier)
assert y_default.shape == (1, 5), "Default classifier should work"
print("✅ Network architecture patterns work correctly!")
test_unit_network_patterns()
# %%
def test_module():
"""Run all module tests to verify complete implementation"""
print("🧪 Running all Network module tests...")
test_unit_two_layer_composition()
test_unit_forward_pass()
test_unit_deep_network()
test_unit_deep_forward()
test_unit_parameter_counting()
test_unit_network_patterns()
print("✅ All Network module tests passed! Manual network composition complete.")
# %% [markdown]
"""
## 🔍 Systems Analysis
Now that your network implementations are complete and tested, let's analyze their systems behavior:
### Performance and Memory Characteristics
Understanding how networks scale with size and depth is crucial for building real ML systems.
"""
# %%
def measure_network_scaling():
"""
📊 SYSTEMS MEASUREMENT: Network Scaling Analysis
Measure how network complexity affects performance and memory usage.
"""
print("📊 NETWORK SCALING MEASUREMENT")
print("Testing how network depth and width affect computational complexity...")
import time
# Test different network architectures
architectures = [
("Narrow-Deep", [10, 8, 6, 4, 2]),
("Wide-Shallow", [10, 50, 2]),
("Balanced", [10, 20, 10, 2]),
("Very Deep", [10, 8, 6, 5, 4, 3, 2])
]
batch_size = 100
num_trials = 10
for name, layer_sizes in architectures:
print(f"\n🔧 Testing {name} architecture: {layer_sizes}")
# Create network
components = compose_deep_network(layer_sizes)
total_params, breakdown = count_network_parameters(components)
# Measure forward pass time
x = Tensor(np.random.randn(batch_size, layer_sizes[0]))
times = []
for _ in range(num_trials):
start = time.perf_counter()
y = forward_pass_deep(x, components)
elapsed = time.perf_counter() - start
times.append(elapsed)
avg_time = np.mean(times) * 1000 # Convert to milliseconds
print(f" Parameters: {total_params:,}")
print(f" Layers: {len([c for c in components if isinstance(c, Linear)])}")
print(f" Forward pass: {avg_time:.2f}ms (batch={batch_size})")
print(f" Time per sample: {avg_time/batch_size:.3f}ms")
# Memory analysis
total_weights = sum(layer.weights.data.size for layer in components if isinstance(layer, Linear))
total_biases = sum(layer.bias.data.size for layer in components if isinstance(layer, Linear))
memory_mb = (total_weights + total_biases) * 4 / 1024 / 1024 # float32 = 4 bytes
print(f" Memory usage: {memory_mb:.2f} MB")
print(f"\n💡 SCALING INSIGHTS:")
print(f" • Depth vs Width: More layers = more sequential computation")
print(f" • Parameter count dominates memory usage")
print(f" • Batch processing amortizes per-sample overhead")
print(f" • Network architecture significantly impacts performance")
# Run the measurement
measure_network_scaling()
# %%
def measure_parameter_scaling():
"""
💾 SYSTEMS MEASUREMENT: Parameter Memory Analysis
Understand how parameter count scales with network size.
"""
print("💾 PARAMETER MEMORY MEASUREMENT")
print("Analyzing parameter scaling patterns...")
# Test parameter scaling with width
print("\n📏 Width Scaling (2-layer networks):")
widths = [10, 50, 100, 200, 500]
for width in widths:
components = compose_deep_network([10, width, 5])
total_params, _ = count_network_parameters(components)
memory_mb = total_params * 4 / 1024 / 1024
print(f" Width {width:3d}: {total_params:,} params, {memory_mb:.2f} MB")
# Test parameter scaling with depth
print("\n📏 Depth Scaling (constant width=20):")
depths = [2, 4, 6, 8, 10]
for depth in depths:
layer_sizes = [20] * (depth + 1) # depth+1 layer sizes for depth layers
layer_sizes[-1] = 5 # Output size
components = compose_deep_network(layer_sizes)
total_params, _ = count_network_parameters(components)
memory_mb = total_params * 4 / 1024 / 1024
print(f" Depth {depth:2d}: {total_params:,} params, {memory_mb:.2f} MB")
print(f"\n💡 PARAMETER INSIGHTS:")
print(f" • Width scaling: Quadratic growth O(W²) for layer connections")
print(f" • Depth scaling: Linear growth O(D) for constant width")
print(f" • First and last layers often dominate parameter count")
print(f" • Memory grows linearly with parameter count")
# Run the measurement
measure_parameter_scaling()
# %%
def measure_batch_processing():
"""
📦 SYSTEMS MEASUREMENT: Batch Processing Efficiency
Analyze how batch size affects computational efficiency.
"""
print("📦 BATCH PROCESSING MEASUREMENT")
print("Testing computational efficiency across batch sizes...")
import time
# Create test network
components = compose_deep_network([100, 50, 25, 10])
batch_sizes = [1, 10, 50, 100, 500, 1000]
num_trials = 5
print("\nBatch Size | Total Time | Time/Sample | Throughput")
print("-" * 55)
for batch_size in batch_sizes:
x = Tensor(np.random.randn(batch_size, 100))
times = []
for _ in range(num_trials):
start = time.perf_counter()
y = forward_pass_deep(x, components)
elapsed = time.perf_counter() - start
times.append(elapsed)
avg_time = np.mean(times) * 1000 # milliseconds
time_per_sample = avg_time / batch_size
throughput = 1000 / time_per_sample # samples per second
print(f"{batch_size:9d} | {avg_time:9.2f}ms | {time_per_sample:10.3f}ms | {throughput:8.0f} samples/s")
print(f"\n💡 BATCH PROCESSING INSIGHTS:")
print(f" • Larger batches amortize per-batch overhead")
print(f" • Time per sample decreases with batch size")
print(f" • Throughput increases significantly with batching")
print(f" • Memory usage scales linearly with batch size")
# Run the measurement
measure_batch_processing()
# %% [markdown]
"""
## 🤔 ML Systems Thinking: Interactive Questions
Now that you've implemented manual network composition, let's connect this to broader ML systems principles:
"""
# %% [markdown]
"""
### Question 1: Memory and Performance Analysis
In your `count_network_parameters()` function, you discovered that a 3-layer network with sizes [784, 128, 64, 10] has about 109,000 parameters.
When you tested this network with different batch sizes, you saw that processing time per sample decreased with larger batches. Analyze the memory and computational trade-offs:
**Your Implementation Analysis:**
- How does the parameter memory (109K parameters × 4 bytes = ~436KB) compare to activation memory for different batch sizes?
- Why does your `forward_pass_deep()` function become more efficient per sample with larger batches?
- At what batch size would activation memory exceed parameter memory for this network?
**Systems Engineering Question:**
If you needed to deploy this network on a device with only 1MB of available memory, what modifications to your network composition functions would you implement to stay within memory constraints while maintaining reasonable accuracy?
Think about: Parameter sharing strategies, layer width reduction, depth vs width trade-offs
"""
# %% [markdown]
"""
### Question 2: Architecture Scaling Analysis
Your `compose_deep_network()` function can create networks of arbitrary depth and width. You measured that very deep networks (10+ layers) have linear parameter growth but may suffer from other issues.
**Implementation Scaling Analysis:**
- In your deep network experiments, which architecture pattern (narrow-deep vs wide-shallow) was more computationally efficient?
- How would you modify your `forward_pass_deep()` function to handle networks with 100+ layers efficiently?
- What bottlenecks would emerge in your current manual composition approach for very large networks?
**Production Engineering Question:**
Design a modification to your current network composition system that could handle production-scale networks (1000+ layers, millions of parameters) while maintaining the educational clarity of manual composition.
Think about: Memory checkpointing, activation recomputation, gradient accumulation patterns
"""
# %% [markdown]
"""
### Question 3: Integration and Modularity Analysis
Your manual network composition approach gives you complete control over layer ordering and activation placement. However, you've seen that composing networks manually becomes complex for large architectures.
**Integration Analysis:**
- How would you extend your current `create_classifier_network()` and `create_regression_network()` functions to support more complex architectures like residual connections?
- What interface changes to your component system would be needed to handle branching network topologies?
- How does manual composition compare to automated composition in terms of debugging and understanding?
**Systems Architecture Question:**
Design a hybrid approach that maintains the educational benefits of your manual composition while providing the convenience of automated network building for complex architectures. What abstractions would you introduce?
Think about: Component interfaces, graph representations, debugging visibility
"""
# %% [markdown]
"""
## 🎯 MODULE SUMMARY: Networks - Manual Composition Mastery
Congratulations! You've successfully implemented manual network composition that forms the foundation of all neural network architectures:
### What You've Accomplished
✅ **Manual Network Composition**: Built 150+ lines of network architecture code with step-by-step layer composition
✅ **Forward Pass Logic**: Implemented data flow through networks of arbitrary depth and complexity
✅ **Parameter Analysis**: Created comprehensive parameter counting and memory analysis systems
✅ **Architecture Patterns**: Built specialized networks for classification, regression, and custom tasks
✅ **Systems Understanding**: Analyzed scaling behavior, memory usage, and computational complexity
### Key Learning Outcomes
- **Network Architecture**: Understanding how layers compose into intelligent systems through manual control
- **Data Flow Principles**: Mastery of tensor shape transformations through network layers
- **Parameter Management**: Deep insight into memory requirements and computational complexity
- **Performance Characteristics**: Knowledge of how network depth and width affect efficiency
### Mathematical Foundations Mastered
- **Composition Functions**: f(g(h(x))) = network(x) through sequential application
- **Parameter Scaling**: O(input_size × output_size) per layer, O(depth) for network
- **Memory Complexity**: Linear scaling with parameters plus O(batch_size × max_layer_width) for activations
### Professional Skills Developed
- **Manual Architecture Design**: Building networks layer-by-layer with complete understanding
- **Systems Analysis**: Measuring and optimizing network performance characteristics
- **Memory Engineering**: Understanding parameter vs activation memory trade-offs
- **Performance Optimization**: Batch processing and computational efficiency analysis
### Ready for Advanced Applications
Your manual network composition now enables:
- **Custom Architectures**: Build any network topology with complete understanding
- **Performance Analysis**: Measure and optimize network computational characteristics
- **Memory Management**: Predict and control network memory requirements
- **Educational Foundation**: Deep understanding before automated composition tools
### Connection to Real ML Systems
Your implementation mirrors production patterns:
- **PyTorch**: Your manual composition matches nn.Sequential() internal behavior
- **TensorFlow**: Similar to tf.keras.Sequential() layer-by-layer construction
- **Industry Standard**: Manual composition used for custom architectures and research
### Next Steps
1. **Export your module**: `tito module complete 04_networks`
2. **Validate integration**: `tito test --module networks`
3. **Explore automated composition**: Your foundation enables understanding Sequential in Module 05
4. **Ready for Module 05**: Linear Networks with automated composition tools
**🚀 Achievement Unlocked**: Your manual network composition mastery provides the deep understanding needed for building automated ML frameworks. You've learned to think like a neural network architect!
"""
# %%
if __name__ == "__main__":
# Run all tests to validate complete implementation
test_module()
# Display completion message
print("\n" + "="*60)
print("🎯 MODULE 04 (NETWORKS) COMPLETE!")
print("📈 Progress: Manual Network Composition ✓")
print("🔥 Next up: Module 05 - Automated Linear Networks!")
print("💪 You're building real ML architecture understanding!")
print("="*60)