mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-12 01:45:57 -05:00
- Fix module imports to use tinytorch.core.* instead of local module imports - Activations module now imports from tinytorch.core.tensor for stability - Layers module imports from tinytorch.core.tensor and tinytorch.core.activations - Test files updated to use main package imports for dependencies - This ensures students can focus on current module without dependency issues - Previous modules are 'locked in' and guaranteed to work - Mirrors real-world usage patterns like PyTorch - Maintains educational progression while ensuring system stability
658 lines
21 KiB
Python
658 lines
21 KiB
Python
# ---
|
||
# jupyter:
|
||
# jupytext:
|
||
# text_representation:
|
||
# extension: .py
|
||
# format_name: percent
|
||
# format_version: '1.3'
|
||
# jupytext_version: 1.17.1
|
||
# ---
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Module 2: Layers - Neural Network Building Blocks
|
||
|
||
Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.
|
||
|
||
## Learning Goals
|
||
- Understand layers as functions that transform tensors: `y = f(x)`
|
||
- Implement Dense layers with linear transformations: `y = Wx + b`
|
||
- Use activation functions from the activations module for nonlinearity
|
||
- See how neural networks are just function composition
|
||
- Build intuition before diving into training
|
||
|
||
## Build → Use → Understand
|
||
1. **Build**: Dense layers using activation functions as building blocks
|
||
2. **Use**: Transform tensors and see immediate results
|
||
3. **Understand**: How neural networks transform information
|
||
|
||
## Module Dependencies
|
||
This module builds on the **activations** module:
|
||
- **activations** → **layers** → **networks**
|
||
- Clean separation of concerns: math functions → layer building blocks → full networks
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 📦 Where This Code Lives in the Final Package
|
||
|
||
**Learning Side:** You work in `modules/layers/layers_dev.py`
|
||
**Building Side:** Code exports to `tinytorch.core.layers`
|
||
|
||
```python
|
||
# Final package structure:
|
||
from tinytorch.core.layers import Dense, Conv2D # All layers together!
|
||
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
|
||
from tinytorch.core.tensor import Tensor
|
||
```
|
||
|
||
**Why this matters:**
|
||
- **Learning:** Focused modules for deep understanding
|
||
- **Production:** Proper organization like PyTorch's `torch.nn`
|
||
- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`
|
||
"""
|
||
|
||
# %%
|
||
#| default_exp core.layers
|
||
|
||
# Setup and imports
|
||
import numpy as np
|
||
import sys
|
||
from typing import Union, Optional, Callable
|
||
import math
|
||
|
||
# %%
|
||
#| export
|
||
import numpy as np
|
||
import math
|
||
import sys
|
||
from typing import Union, Optional, Callable
|
||
|
||
# Import from the main package (rock solid foundation)
|
||
from tinytorch.core.tensor import Tensor
|
||
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
|
||
|
||
# print("🔥 TinyTorch Layers Module")
|
||
# print(f"NumPy version: {np.__version__}")
|
||
# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
|
||
# print("Ready to build neural network layers!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 1: What is a Layer?
|
||
|
||
### Definition
|
||
A **layer** is a function that transforms tensors. Think of it as a mathematical operation that takes input data and produces output data:
|
||
|
||
```
|
||
Input Tensor → Layer → Output Tensor
|
||
```
|
||
|
||
### Why Layers Matter in Neural Networks
|
||
Layers are the fundamental building blocks of all neural networks because:
|
||
- **Modularity**: Each layer has a specific job (linear transformation, nonlinearity, etc.)
|
||
- **Composability**: Layers can be combined to create complex functions
|
||
- **Learnability**: Each layer has parameters that can be learned from data
|
||
- **Interpretability**: Different layers learn different features
|
||
|
||
### The Fundamental Insight
|
||
**Neural networks are just function composition!**
|
||
```
|
||
x → Layer1 → Layer2 → Layer3 → y
|
||
```
|
||
|
||
Each layer transforms the data, and the final output is the composition of all these transformations.
|
||
|
||
### Real-World Examples
|
||
- **Dense Layer**: Learns linear relationships between features
|
||
- **Convolutional Layer**: Learns spatial patterns in images
|
||
- **Recurrent Layer**: Learns temporal patterns in sequences
|
||
- **Activation Layer**: Adds nonlinearity to make networks powerful
|
||
|
||
### Visual Intuition
|
||
```
|
||
Input: [1, 2, 3] (3 features)
|
||
Dense Layer: y = Wx + b
|
||
Weights W: [[0.1, 0.2, 0.3],
|
||
[0.4, 0.5, 0.6]] (2×3 matrix)
|
||
Bias b: [0.1, 0.2] (2 values)
|
||
Output: [0.1*1 + 0.2*2 + 0.3*3 + 0.1,
|
||
0.4*1 + 0.5*2 + 0.6*3 + 0.2] = [1.4, 3.2]
|
||
```
|
||
|
||
Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 2: Understanding Matrix Multiplication
|
||
|
||
Before we build layers, let's understand the core operation: **matrix multiplication**. This is what powers all neural network computations.
|
||
|
||
### Why Matrix Multiplication Matters
|
||
- **Efficiency**: Process multiple inputs at once
|
||
- **Parallelization**: GPU acceleration works great with matrix operations
|
||
- **Batch processing**: Handle multiple samples simultaneously
|
||
- **Mathematical foundation**: Linear algebra is the language of neural networks
|
||
|
||
### The Math Behind It
|
||
For matrices A (m×n) and B (n×p), the result C (m×p) is:
|
||
```
|
||
C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
|
||
```
|
||
|
||
### Visual Example
|
||
```
|
||
A = [[1, 2], B = [[5, 6],
|
||
[3, 4]] [7, 8]]
|
||
|
||
C = A @ B = [[1*5 + 2*7, 1*6 + 2*8],
|
||
[3*5 + 4*7, 3*6 + 4*8]]
|
||
= [[19, 22],
|
||
[43, 50]]
|
||
```
|
||
|
||
Let's implement this step by step!
|
||
"""
|
||
|
||
# %%
|
||
#| export
|
||
def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
|
||
"""
|
||
Naive matrix multiplication using explicit for-loops.
|
||
|
||
This helps you understand what matrix multiplication really does!
|
||
|
||
Args:
|
||
A: Matrix of shape (m, n)
|
||
B: Matrix of shape (n, p)
|
||
|
||
Returns:
|
||
Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
|
||
|
||
TODO: Implement matrix multiplication using three nested for-loops.
|
||
|
||
APPROACH:
|
||
1. Get the dimensions: m, n from A and n2, p from B
|
||
2. Check that n == n2 (matrices must be compatible)
|
||
3. Create output matrix C of shape (m, p) filled with zeros
|
||
4. Use three nested loops:
|
||
- i loop: rows of A (0 to m-1)
|
||
- j loop: columns of B (0 to p-1)
|
||
- k loop: shared dimension (0 to n-1)
|
||
5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]
|
||
|
||
EXAMPLE:
|
||
A = [[1, 2], B = [[5, 6],
|
||
[3, 4]] [7, 8]]
|
||
|
||
C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19
|
||
C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22
|
||
C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43
|
||
C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50
|
||
|
||
HINTS:
|
||
- Start with C = np.zeros((m, p))
|
||
- Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):
|
||
- Accumulate the sum: C[i,j] += A[i,k] * B[k,j]
|
||
"""
|
||
raise NotImplementedError("Student implementation required")
|
||
|
||
# %%
|
||
#| hide
|
||
#| export
|
||
def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
|
||
"""
|
||
Naive matrix multiplication using explicit for-loops.
|
||
|
||
This helps you understand what matrix multiplication really does!
|
||
"""
|
||
m, n = A.shape
|
||
n2, p = B.shape
|
||
assert n == n2, f"Matrix shapes don't match: A({m},{n}) @ B({n2},{p})"
|
||
|
||
C = np.zeros((m, p))
|
||
for i in range(m):
|
||
for j in range(p):
|
||
for k in range(n):
|
||
C[i, j] += A[i, k] * B[k, j]
|
||
return C
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Test Your Matrix Multiplication
|
||
"""
|
||
|
||
# %%
|
||
# Test matrix multiplication
|
||
print("Testing matrix multiplication...")
|
||
|
||
try:
|
||
# Test case 1: Simple 2x2 matrices
|
||
A = np.array([[1, 2], [3, 4]], dtype=np.float32)
|
||
B = np.array([[5, 6], [7, 8]], dtype=np.float32)
|
||
|
||
result = matmul_naive(A, B)
|
||
expected = np.array([[19, 22], [43, 50]], dtype=np.float32)
|
||
|
||
print(f"✅ Matrix A:\n{A}")
|
||
print(f"✅ Matrix B:\n{B}")
|
||
print(f"✅ Your result:\n{result}")
|
||
print(f"✅ Expected:\n{expected}")
|
||
|
||
assert np.allclose(result, expected), "❌ Result doesn't match expected!"
|
||
print("🎉 Matrix multiplication works!")
|
||
|
||
# Test case 2: Compare with NumPy
|
||
numpy_result = A @ B
|
||
assert np.allclose(result, numpy_result), "❌ Doesn't match NumPy result!"
|
||
print("✅ Matches NumPy implementation!")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Error: {e}")
|
||
print("Make sure to implement matmul_naive above!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 3: Building the Dense Layer
|
||
|
||
Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b`
|
||
|
||
### What is a Dense Layer?
|
||
- **Linear transformation**: `y = Wx + b`
|
||
- **W**: Weight matrix (learnable parameters)
|
||
- **x**: Input tensor
|
||
- **b**: Bias vector (learnable parameters)
|
||
- **y**: Output tensor
|
||
|
||
### Why Dense Layers Matter
|
||
- **Universal approximation**: Can approximate any function with enough neurons
|
||
- **Feature learning**: Each neuron learns a different feature
|
||
- **Nonlinearity**: When combined with activation functions, becomes very powerful
|
||
- **Foundation**: All other layers build on this concept
|
||
|
||
### The Math
|
||
For input x of shape (batch_size, input_size):
|
||
- **W**: Weight matrix of shape (input_size, output_size)
|
||
- **b**: Bias vector of shape (output_size)
|
||
- **y**: Output of shape (batch_size, output_size)
|
||
|
||
### Visual Example
|
||
```
|
||
Input: x = [1, 2, 3] (3 features)
|
||
Weights: W = [[0.1, 0.2], Bias: b = [0.1, 0.2]
|
||
[0.3, 0.4],
|
||
[0.5, 0.6]]
|
||
|
||
Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3, 0.2*1 + 0.4*2 + 0.6*3]
|
||
= [2.2, 3.2]
|
||
|
||
Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4]
|
||
```
|
||
|
||
Let's implement this!
|
||
"""
|
||
|
||
# %%
|
||
#| export
|
||
class Dense:
|
||
"""
|
||
Dense (Linear) Layer: y = Wx + b
|
||
|
||
The fundamental building block of neural networks.
|
||
Performs linear transformation: matrix multiplication + bias addition.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output features
|
||
use_bias: Whether to include bias term (default: True)
|
||
use_naive_matmul: Whether to use naive matrix multiplication (for learning)
|
||
|
||
TODO: Implement the Dense layer with weight initialization and forward pass.
|
||
|
||
APPROACH:
|
||
1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
|
||
2. Initialize weights with small random values (Xavier/Glorot initialization)
|
||
3. Initialize bias to zeros (if use_bias=True)
|
||
4. Implement forward pass using matrix multiplication and bias addition
|
||
|
||
EXAMPLE:
|
||
layer = Dense(input_size=3, output_size=2)
|
||
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
|
||
y = layer(x) # shape: (1, 2)
|
||
|
||
HINTS:
|
||
- Use np.random.randn() for random initialization
|
||
- Scale weights by sqrt(2/(input_size + output_size)) for Xavier init
|
||
- Store weights and bias as numpy arrays
|
||
- Use matmul_naive or @ operator based on use_naive_matmul flag
|
||
"""
|
||
|
||
def __init__(self, input_size: int, output_size: int, use_bias: bool = True,
|
||
use_naive_matmul: bool = False):
|
||
"""
|
||
Initialize Dense layer with random weights.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output features
|
||
use_bias: Whether to include bias term
|
||
use_naive_matmul: Use naive matrix multiplication (for learning)
|
||
|
||
TODO:
|
||
1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
|
||
2. Initialize weights with small random values
|
||
3. Initialize bias to zeros (if use_bias=True)
|
||
|
||
STEP-BY-STEP:
|
||
1. Store the parameters as instance variables
|
||
2. Calculate scale factor for Xavier initialization: sqrt(2/(input_size + output_size))
|
||
3. Initialize weights: np.random.randn(input_size, output_size) * scale
|
||
4. If use_bias=True, initialize bias: np.zeros(output_size)
|
||
5. If use_bias=False, set bias to None
|
||
|
||
EXAMPLE:
|
||
Dense(3, 2) creates:
|
||
- weights: shape (3, 2) with small random values
|
||
- bias: shape (2,) with zeros
|
||
"""
|
||
raise NotImplementedError("Student implementation required")
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Forward pass: y = Wx + b
|
||
|
||
Args:
|
||
x: Input tensor of shape (batch_size, input_size)
|
||
|
||
Returns:
|
||
Output tensor of shape (batch_size, output_size)
|
||
|
||
TODO: Implement matrix multiplication and bias addition
|
||
- Use self.use_naive_matmul to choose between NumPy and naive implementation
|
||
- If use_naive_matmul=True, use matmul_naive(x.data, self.weights)
|
||
- If use_naive_matmul=False, use x.data @ self.weights
|
||
- Add bias if self.use_bias=True
|
||
|
||
STEP-BY-STEP:
|
||
1. Perform matrix multiplication: Wx
|
||
- If use_naive_matmul: result = matmul_naive(x.data, self.weights)
|
||
- Else: result = x.data @ self.weights
|
||
2. Add bias if use_bias: result += self.bias
|
||
3. Return Tensor(result)
|
||
|
||
EXAMPLE:
|
||
Input x: Tensor([[1, 2, 3]]) # shape (1, 3)
|
||
Weights: shape (3, 2)
|
||
Output: Tensor([[val1, val2]]) # shape (1, 2)
|
||
|
||
HINTS:
|
||
- x.data gives you the numpy array
|
||
- self.weights is your weight matrix
|
||
- Use broadcasting for bias addition: result + self.bias
|
||
- Return Tensor(result) to wrap the result
|
||
"""
|
||
raise NotImplementedError("Student implementation required")
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make layer callable: layer(x) same as layer.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %%
|
||
#| hide
|
||
#| export
|
||
class Dense:
|
||
"""
|
||
Dense (Linear) Layer: y = Wx + b
|
||
|
||
The fundamental building block of neural networks.
|
||
Performs linear transformation: matrix multiplication + bias addition.
|
||
"""
|
||
|
||
def __init__(self, input_size: int, output_size: int, use_bias: bool = True,
|
||
use_naive_matmul: bool = False):
|
||
"""
|
||
Initialize Dense layer with random weights.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output features
|
||
use_bias: Whether to include bias term
|
||
use_naive_matmul: Use naive matrix multiplication (for learning)
|
||
"""
|
||
# Store parameters
|
||
self.input_size = input_size
|
||
self.output_size = output_size
|
||
self.use_bias = use_bias
|
||
self.use_naive_matmul = use_naive_matmul
|
||
|
||
# Xavier/Glorot initialization
|
||
scale = np.sqrt(2.0 / (input_size + output_size))
|
||
self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale
|
||
|
||
# Initialize bias
|
||
if use_bias:
|
||
self.bias = np.zeros(output_size, dtype=np.float32)
|
||
else:
|
||
self.bias = None
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Forward pass: y = Wx + b
|
||
|
||
Args:
|
||
x: Input tensor of shape (batch_size, input_size)
|
||
|
||
Returns:
|
||
Output tensor of shape (batch_size, output_size)
|
||
"""
|
||
# Matrix multiplication
|
||
if self.use_naive_matmul:
|
||
result = matmul_naive(x.data, self.weights)
|
||
else:
|
||
result = x.data @ self.weights
|
||
|
||
# Add bias
|
||
if self.use_bias:
|
||
result += self.bias
|
||
|
||
return Tensor(result)
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make layer callable: layer(x) same as layer.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Test Your Dense Layer
|
||
"""
|
||
|
||
# %%
|
||
# Test Dense layer
|
||
print("Testing Dense layer...")
|
||
|
||
try:
|
||
# Test basic Dense layer
|
||
layer = Dense(input_size=3, output_size=2, use_bias=True)
|
||
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
|
||
|
||
print(f"✅ Input shape: {x.shape}")
|
||
print(f"✅ Layer weights shape: {layer.weights.shape}")
|
||
print(f"✅ Layer bias shape: {layer.bias.shape}")
|
||
|
||
y = layer(x)
|
||
print(f"✅ Output shape: {y.shape}")
|
||
print(f"✅ Output: {y}")
|
||
|
||
# Test without bias
|
||
layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)
|
||
x2 = Tensor([[1, 2]])
|
||
y2 = layer_no_bias(x2)
|
||
print(f"✅ No bias output: {y2}")
|
||
|
||
# Test naive matrix multiplication
|
||
layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)
|
||
x3 = Tensor([[1, 2]])
|
||
y3 = layer_naive(x3)
|
||
print(f"✅ Naive matmul output: {y3}")
|
||
|
||
print("\n🎉 All Dense layer tests passed!")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Error: {e}")
|
||
print("Make sure to implement the Dense layer above!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 4: Composing Layers with Activations
|
||
|
||
Now let's see how layers work together! A neural network is just layers composed with activation functions.
|
||
|
||
### Why Layer Composition Matters
|
||
- **Nonlinearity**: Activation functions make networks powerful
|
||
- **Feature learning**: Each layer learns different levels of features
|
||
- **Universal approximation**: Can approximate any function
|
||
- **Modularity**: Easy to experiment with different architectures
|
||
|
||
### The Pattern
|
||
```
|
||
Input → Dense → Activation → Dense → Activation → Output
|
||
```
|
||
|
||
### Real-World Example
|
||
```
|
||
Input: [1, 2, 3] (3 features)
|
||
Dense(3→2): [1.4, 2.8] (linear transformation)
|
||
ReLU: [1.4, 2.8] (nonlinearity)
|
||
Dense(2→1): [3.2] (final prediction)
|
||
```
|
||
|
||
Let's build a simple network!
|
||
"""
|
||
|
||
# %%
|
||
# Test layer composition
|
||
print("Testing layer composition...")
|
||
|
||
try:
|
||
# Create a simple network: Dense → ReLU → Dense
|
||
dense1 = Dense(input_size=3, output_size=2)
|
||
relu = ReLU()
|
||
dense2 = Dense(input_size=2, output_size=1)
|
||
|
||
# Test input
|
||
x = Tensor([[1, 2, 3]])
|
||
print(f"✅ Input: {x}")
|
||
|
||
# Forward pass through the network
|
||
h1 = dense1(x)
|
||
print(f"✅ After Dense1: {h1}")
|
||
|
||
h2 = relu(h1)
|
||
print(f"✅ After ReLU: {h2}")
|
||
|
||
y = dense2(h2)
|
||
print(f"✅ Final output: {y}")
|
||
|
||
print("\n🎉 Layer composition works!")
|
||
print("This is how neural networks work: layers + activations!")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Error: {e}")
|
||
print("Make sure all your layers and activations are working!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 5: Performance Comparison
|
||
|
||
Let's compare our naive matrix multiplication with NumPy's optimized version to understand why optimization matters in ML.
|
||
|
||
### Why Performance Matters
|
||
- **Training time**: Neural networks train for hours/days
|
||
- **Inference speed**: Real-time applications need fast predictions
|
||
- **GPU utilization**: Optimized operations use hardware efficiently
|
||
- **Scalability**: Large models need efficient implementations
|
||
"""
|
||
|
||
# %%
|
||
# Performance comparison
|
||
print("Comparing naive vs NumPy matrix multiplication...")
|
||
|
||
try:
|
||
import time
|
||
|
||
# Create test matrices
|
||
A = np.random.randn(100, 100).astype(np.float32)
|
||
B = np.random.randn(100, 100).astype(np.float32)
|
||
|
||
# Time naive implementation
|
||
start_time = time.time()
|
||
result_naive = matmul_naive(A, B)
|
||
naive_time = time.time() - start_time
|
||
|
||
# Time NumPy implementation
|
||
start_time = time.time()
|
||
result_numpy = A @ B
|
||
numpy_time = time.time() - start_time
|
||
|
||
print(f"✅ Naive time: {naive_time:.4f} seconds")
|
||
print(f"✅ NumPy time: {numpy_time:.4f} seconds")
|
||
print(f"✅ Speedup: {naive_time/numpy_time:.1f}x faster")
|
||
|
||
# Verify correctness
|
||
assert np.allclose(result_naive, result_numpy), "Results don't match!"
|
||
print("✅ Results are identical!")
|
||
|
||
print("\n💡 This is why we use optimized libraries in production!")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Error: {e}")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🎯 Module Summary
|
||
|
||
Congratulations! You've built the foundation of neural network layers:
|
||
|
||
### What You've Accomplished
|
||
✅ **Matrix Multiplication**: Understanding the core operation
|
||
✅ **Dense Layer**: Linear transformation with weights and bias
|
||
✅ **Layer Composition**: Combining layers with activations
|
||
✅ **Performance Awareness**: Understanding optimization importance
|
||
✅ **Testing**: Immediate feedback on your implementations
|
||
|
||
### Key Concepts You've Learned
|
||
- **Layers** are functions that transform tensors
|
||
- **Matrix multiplication** powers all neural network computations
|
||
- **Dense layers** perform linear transformations: `y = Wx + b`
|
||
- **Layer composition** creates complex functions from simple building blocks
|
||
- **Performance** matters for real-world ML applications
|
||
|
||
### What's Next
|
||
In the next modules, you'll build on this foundation:
|
||
- **Networks**: Compose layers into complete models
|
||
- **Training**: Learn parameters with gradients and optimization
|
||
- **Convolutional layers**: Process spatial data like images
|
||
- **Recurrent layers**: Process sequential data like text
|
||
|
||
### Real-World Connection
|
||
Your Dense layer is now ready to:
|
||
- Learn patterns in data through weight updates
|
||
- Transform features for classification and regression
|
||
- Serve as building blocks for complex architectures
|
||
- Integrate with the rest of the TinyTorch ecosystem
|
||
|
||
**Ready for the next challenge?** Let's move on to building complete neural networks!
|
||
"""
|
||
|
||
# %%
|
||
# Final verification
|
||
print("\n" + "="*50)
|
||
print("🎉 LAYERS MODULE COMPLETE!")
|
||
print("="*50)
|
||
print("✅ Matrix multiplication understanding")
|
||
print("✅ Dense layer implementation")
|
||
print("✅ Layer composition with activations")
|
||
print("✅ Performance awareness")
|
||
print("✅ Comprehensive testing")
|
||
print("\n🚀 Ready to build networks in the next module!") |