Refactor Milestone 1: Clean forward pass with Rich CLI

- Reorganized milestone structure to historical progression (01-06)
- Created single forward_pass.py with student code clearly at top
- Added Rich CLI visualizations: data scatter, network diagram, decision boundary
- Show decision boundary using / or \ based on slope
- No random seed - students see variability in random weights
- Annotated all code with which modules were used (Modules 01-03)
- Added introductory panel explaining what to expect
- Updated DEFINITIVE_MODULE_PLAN.md with corrected milestone structure
This commit is contained in:
Vijay Janapa Reddi
2025-09-30 12:03:19 -04:00
parent 8be87d0add
commit 52da14a8ea
16 changed files with 561 additions and 187 deletions

View File

@@ -0,0 +1,418 @@
#!/usr/bin/env python3
"""
The Perceptron (1957) - Frank Rosenblatt [FORWARD PASS ONLY]
=============================================================
📚 HISTORICAL CONTEXT:
Frank Rosenblatt's Perceptron was the first trainable artificial neural network that
could learn from examples. It sparked the first AI boom and demonstrated that machines
could actually learn to recognize patterns, launching the neural network revolution.
🎯 MILESTONE 1: FORWARD PASS (BEFORE TRAINING)
Using YOUR TinyTorch implementations, you'll build a perceptron with RANDOM weights.
This milestone shows you WHY training is essential - the model won't work without it!
⚠️ IMPORTANT: This is NOT the trained version!
- You've completed Modules 01-04 (Tensor, Activations, Layers, Losses)
- You HAVEN'T learned training yet (Modules 05-07: Autograd, Optimizers, Training)
- This milestone demonstrates the PROBLEM that training will solve
✅ REQUIRED MODULES (Run after Module 04):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Module 01 (Tensor) : YOUR data structure (gradients dormant for now)
Module 02 (Activations) : YOUR sigmoid activation function
Module 03 (Layers) : YOUR Linear layer with RANDOM weights
Module 04 (Losses) : YOUR loss functions (for measuring failure)
Data Generation : Directly generated within this script
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏗️ ARCHITECTURE (Original 1957 Design):
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Input │ │ Linear │ │ Sigmoid │ │ Binary │
│ Features │───▶│ YOUR Module │───▶│ YOUR Module │───▶│ Output │
│ (x1, x2) │ │ 03 │ │ 02 │ │ (0 or 1) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
🔍 WHAT YOU'LL SEE - EXPECTATION vs REALITY:
WHAT YOU MIGHT EXPECT: WHAT YOU'LL ACTUALLY GET:
"I built it, so it works!" "Wait... it's just guessing!"
4 │ • • • • • 4 │ • ○ • ○ •
│ • • • • • Perfect! │ ○ • • ○ • ╲ Random!
2 │ • • • • • 2 │ • ○ • • ○ •
│ ○ ○ ○ ○ ○ │ ○ • ○ ○ • ○
0 │ ○ ○ ○ ○ ○ 0 │ • ○ • ○ ○ •
└──────────── └────────────
0 2 4 0 2 4
❌ Accuracy: ~50% ❌ Accuracy: ~50%
(What you hoped for) (What random weights give you)
WHY IS IT SO BAD?
The weights are RANDOM! Without training:
- w₁, w₂, b are random numbers from initialization
- The decision boundary is in a random position
- Predictions are essentially coin flips
Mathematical Reality:
y = sigmoid(w₁·x₁ + w₂·x₂ + b) ← These are RANDOM values!
Where YOUR modules compute:
- Linear: z = w₁·x₁ + w₂·x₂ + b (random w₁, w₂, b!)
- Sigmoid: y = 1/(1 + e⁻ᶻ) (squash to [0,1])
- Decision: class = 1 if y > 0.5 else 0 (random decision boundary!)
🔍 KEY INSIGHTS (This Milestone):
- ✅ Architecture works: Forward pass executes correctly
- ❌ But it's useless: Random weights = random predictions (~50% accuracy)
- 💡 The lesson: Building the model is easy; making it LEARN is the hard part
- 🎯 Motivation: You NEED training (coming in Modules 05-07!)
📊 WHAT TO EXPECT (This Milestone):
- Dataset: 10 linearly separable synthetic points (just for testing)
- No training: Just forward pass with random weights
- Expected accuracy: ~40-60% (essentially random guessing)
- Key takeaway: "My model doesn't work... yet!"
🚀 WHAT COMES NEXT (After Module 07):
- Same architecture, but WITH training
- Expected accuracy: 95%+ on same problem
- Training time: ~30 seconds
- You'll see the SAME perceptron transform from useless → intelligent
"""
import sys
import os
import numpy as np
import argparse
# Add project root to path for correct tinytorch imports
# This allows the script to be run from the root of the project
sys.path.insert(0, os.getcwd())
# Import TinyTorch components YOU BUILT!
from tinytorch.core.tensor import Tensor # Module 01: YOU built this!
from tinytorch.core.layers import Linear # Module 03: YOU built this!
from tinytorch.core.activations import Sigmoid # Module 02: YOU built this!
# Import Rich for beautiful CLI output
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich import box
from rich.text import Text
console = Console()
# ============================================================================
# 🎓 STUDENT CODE: This is what YOU built with Modules 01-03!
# ============================================================================
class Perceptron:
"""
Simple perceptron: Linear + Sigmoid
This uses components YOU built in:
- Module 01: Tensor (data structure)
- Module 02: Sigmoid (activation function)
- Module 03: Linear (layer with weights)
The entire model is just ~10 lines of code!
"""
def __init__(self, input_size=2, output_size=1):
# Module 03: Linear layer (w1*x1 + w2*x2 + b)
self.linear = Linear(input_size, output_size)
# Module 02: Sigmoid activation (squashes to [0,1])
self.activation = Sigmoid()
def forward(self, x):
# Step 1: Linear transformation (Module 03)
x = self.linear(x)
# Step 2: Activation function (Module 02)
x = self.activation(x)
return x
def __call__(self, x):
"""PyTorch-style: model(x) calls forward(x)"""
return self.forward(x)
# ============================================================================
# 📊 VISUALIZATION CODE: Rich CLI formatting (you can ignore this!)
# ============================================================================
def draw_network_architecture():
"""Draw the perceptron architecture using ASCII art."""
network = """
Input Layer Linear Layer Activation Output
┌─────────┐ ┌──────────────┐ ┌──────────┐ ┌─────────┐
│ │ │ │ │ │ │ │
│ x₁ │───────┤ │ │ │ │ │
│ │ w₁ │ │ z │ │ y │ class │
└─────────┘ │ Linear │─────────│ Sigmoid │──────│ (0 or 1)│
│ (Wx + b) │ │ σ(z) │ │ │
┌─────────┐ │ │ │ │ │ │
│ │ w₂ │ │ │ │ │ │
│ x₂ │───────┤ │ │ │ └─────────┘
│ │ │ │ │ │
└─────────┘ └──────────────┘ └──────────┘
b (bias)
Computation Flow:
1. Linear: z = w₁·x₁ + w₂·x₂ + b
2. Sigmoid: y = 1 / (1 + e⁻ᶻ)
3. Decision: class = 1 if y > 0.5 else 0
"""
return network.strip()
def visualize_data_points(X, y, predictions=None, weights=None):
"""Create ASCII visualization of data points with decision boundary."""
# Create a simple scatter plot
grid_size = 20
grid = [[' ' for _ in range(grid_size)] for _ in range(grid_size)]
# Find bounds
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
# Draw decision boundary if weights provided
# Decision boundary: w1*x1 + w2*x2 + b = 0 → x2 = -(w1*x1 + b)/w2
if weights is not None:
w1, w2, b = weights
if abs(w2) > 0.001: # Avoid division by zero
# Determine slope for choosing line character
slope = -w1 / w2
line_char = '/' if slope > 0 else '\\'
for gx in range(grid_size):
# Map grid x to real x
px = x_min + (gx / (grid_size - 1)) * (x_max - x_min)
# Calculate decision boundary y
py = -(w1 * px + b) / w2
# Map to grid y
gy = int((py - y_min) / (y_max - y_min) * (grid_size - 1))
gy = grid_size - 1 - gy # Flip y-axis
if 0 <= gy < grid_size and grid[gy][gx] == ' ':
grid[gy][gx] = line_char # Decision boundary line
# Plot points (after boundary so they overlap)
for i, (px, py) in enumerate(X):
# Map to grid
gx = int((px - x_min) / (x_max - x_min) * (grid_size - 1))
gy = int((py - y_min) / (y_max - y_min) * (grid_size - 1))
gy = grid_size - 1 - gy # Flip y-axis
if 0 <= gx < grid_size and 0 <= gy < grid_size:
true_label = int(y[i])
if predictions is not None:
pred_label = int(predictions[i])
# Show correct (green) vs incorrect (red) predictions
if true_label == pred_label:
grid[gy][gx] = '' if true_label == 1 else ''
else:
grid[gy][gx] = '' # Wrong prediction
else:
grid[gy][gx] = '' if true_label == 1 else ''
# Build the plot
lines = []
lines.append(" " + "" * grid_size)
for row in grid:
lines.append("" + "".join(row) + "")
lines.append(" " + "" * grid_size)
lines.append(" ● = Class 1 (should cluster top-right)")
lines.append(" ○ = Class 0 (should cluster bottom-left)")
if weights is not None:
lines.append(" / or \\ = Decision boundary (where z = 0)")
if predictions is not None:
lines.append(" ✗ = Incorrect prediction")
return "\n".join(lines)
def main():
"""Demonstrate Rosenblatt's Perceptron using YOUR TinyTorch system!"""
# Header
console.print()
console.print(Panel.fit(
"[bold cyan]🎯 MILESTONE 1: The Perceptron (1957)[/bold cyan]\n"
"[yellow]⚠️ FORWARD PASS ONLY - Random Weights[/yellow]\n\n"
"[dim]Components: YOUR Tensor + YOUR Linear + YOUR Sigmoid[/dim]",
border_style="cyan"
))
console.print()
# Introduction - What to expect
intro = (
"[bold]What You're Demonstrating:[/bold]\n\n"
"You've completed Modules 01-03 and built these components:\n"
" • [cyan]Module 01:[/cyan] Tensor (data structure)\n"
" • [cyan]Module 02:[/cyan] Sigmoid (activation function)\n"
" • [cyan]Module 03:[/cyan] Linear (layer with weights)\n\n"
"[bold yellow]What to Expect:[/bold yellow]\n"
" • The architecture [green]WORKS[/green] - forward pass succeeds ✓\n"
" • Accuracy is [red]POOR[/red] - random weights = random predictions ✗\n"
" • Decision boundary (/) is in a [yellow]RANDOM[/yellow] position\n"
" • Each run gives [yellow]DIFFERENT[/yellow] results (no seed!)\n\n"
"[bold cyan]The Key Lesson:[/bold cyan]\n"
" Building the model is easy. Making it [bold]LEARN[/bold] is hard.\n"
" That's why you need Modules 04-07: Losses, Autograd, Optimizers, Training!"
)
console.print(Panel(intro, title="[bold cyan]📖 Introduction[/bold cyan]", border_style="cyan"))
console.print()
# Step 1: Prepare synthetic data
console.print("[bold]📊 Step 1: Preparing Data[/bold]")
console.print(" Creating linearly separable clusters...")
console.print(" [dim]This is a SIMPLE problem - a trained model achieves 95%+ easily[/dim]")
console.print(" [yellow]⚠️ No random seed - each run will be different![/yellow]")
cluster1 = np.random.normal([2, 2], 0.5, (5, 2)) # Class 1: top-right
cluster2 = np.random.normal([-2, -2], 0.5, (5, 2)) # Class 0: bottom-left
X = np.vstack([cluster1, cluster2]).astype(np.float32)
y = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0], dtype=np.float32) # True labels
# Show data visualization
console.print()
data_viz = visualize_data_points(X, y)
console.print(Panel(data_viz, title="[cyan]Training Data[/cyan]", border_style="cyan"))
console.print(f" [green]✓[/green] Created {X.shape[0]} points in 2 clearly separated clusters\n")
# Step 2: Create the Perceptron model with YOUR components
console.print("[bold]🧠 Step 2: Building Model[/bold]")
console.print(" [yellow]⚠️ No training yet - you haven't learned Modules 05-07![/yellow]")
console.print(" 🧠 Assembling perceptron with YOUR TinyTorch modules...")
model = Perceptron(input_size=2, output_size=1)
console.print(f" [green]✓[/green] Linear layer: 2 → 1 [dim](YOUR Module 03!)[/dim]")
console.print(f" [green]✓[/green] Activation: Sigmoid [dim](YOUR Module 02!)[/dim]")
console.print(" [yellow]⚠️ Model assembled - but weights are RANDOM![/yellow]\n")
# Show network architecture
network_diagram = draw_network_architecture()
console.print(Panel(network_diagram, title="[cyan]🏗️ Network Architecture (1957 Design)[/cyan]", border_style="cyan"))
console.print()
# Step 3: Test with random weights
console.print("[bold]🔬 Step 3: Testing with Random Weights[/bold]")
console.print(" Running forward pass...\n")
input_tensor = Tensor(X)
predictions = model(input_tensor)
# Convert to binary predictions
pred_classes = (predictions.data > 0.5).astype(int).flatten()
accuracy = (pred_classes == y).mean()
# Format arrays nicely for display
true_str = ' '.join([f"{int(val)}" for val in y])
pred_str = ' '.join([f"{val}" for val in pred_classes])
match_str = ' '.join(['[green]✓[/green]' if m else '[red]✗[/red]' for m in (pred_classes == y)])
# Create results table
results_table = Table(title="📊 Prediction Results", box=box.ROUNDED, border_style="cyan")
results_table.add_column("Metric", style="cyan", no_wrap=True)
results_table.add_column("Value", style="white")
results_table.add_row("True Labels", f"[{true_str}]")
results_table.add_row("Predictions", f"[{pred_str}]")
results_table.add_row("Matches", match_str)
# Determine status
if accuracy < 0.6:
accuracy_display = f"[red]{accuracy:.1%} ❌ Random Guessing![/red]"
status = "FAILED"
status_color = "red"
else:
accuracy_display = f"[yellow]{accuracy:.1%} 🎲 Got Lucky![/yellow]"
status = "LUCKY"
status_color = "yellow"
results_table.add_row("Accuracy", accuracy_display)
console.print(results_table)
console.print()
# Extract weights for visualization and display
w1 = model.linear.weight.data[0,0]
w2 = model.linear.weight.data[1,0]
b = model.linear.bias.data[0]
# Calculate z values (linear output before sigmoid)
z_values = X @ np.array([[w1], [w2]]) + b
# Show visualization with predictions AND decision boundary
pred_viz = visualize_data_points(X, y, pred_classes, weights=(w1, w2, b))
console.print(Panel(pred_viz, title="[cyan]Predictions with Decision Boundary[/cyan]", border_style=status_color))
console.print()
# Show weights AND equation
decision_eq = f"z = {w1:.4f}·x₁ + {w2:.4f}·x₂ + {b:.4f}"
boundary_eq = f"Decision boundary (z=0): x₂ = {-w1/w2:.4f}·x₁ + {-b/w2:.4f}" if abs(w2) > 0.001 else "Decision boundary: vertical line"
weights_content = (
f"[bold]Random Weights:[/bold]\n"
f" w₁ = [yellow]{w1:7.4f}[/yellow]\n"
f" w₂ = [yellow]{w2:7.4f}[/yellow]\n"
f" b = [yellow]{b:7.4f}[/yellow]\n\n"
f"[bold]Linear Function:[/bold]\n"
f" {decision_eq}\n\n"
f"[bold]Decision Line:[/bold]\n"
f" {boundary_eq}\n"
f" [dim](Everything above line → Class 1, below → Class 0)[/dim]"
)
console.print(Panel(weights_content, title="[yellow]🔧 Model Parameters[/yellow]", border_style="yellow"))
console.print()
# Diagnosis
if status == "FAILED":
diagnosis = (
"[red]❌ The model is essentially guessing randomly[/red]\n"
"[red]❌ Random initialization = random decision boundary[/red]\n\n"
"[bold cyan]💡 KEY INSIGHT:[/bold cyan] Building the architecture is easy.\n"
" Making it [bold]LEARN[/bold] is the hard part!"
)
else:
diagnosis = (
"[yellow]🎲 You got lucky with this random initialization![/yellow]\n"
"[yellow]🎲 But this is NOT learning - just chance[/yellow]\n\n"
"[bold cyan]💡 KEY INSIGHT:[/bold cyan] Even when it works, random weights\n"
" won't generalize. You need [bold]TRAINING[/bold]!"
)
console.print(Panel(diagnosis, title=f"[{status_color}]🔍 Diagnosis: {status}[/{status_color}]", border_style=status_color))
# Tip for multiple runs
tip = (
"💡 [bold yellow]Run this script multiple times![/bold yellow]\n\n"
"Each run uses different random weights and data.\n"
"You'll see varying results:\n"
" • Sometimes: High accuracy (got lucky!) 🎲\n"
" • Usually: Low accuracy (random guessing) ❌\n\n"
"[dim]This demonstrates why training is essential - it must work EVERY time![/dim]"
)
console.print(Panel(tip, title="[bold yellow]💡 Experiment[/bold yellow]", border_style="yellow"))
console.print()
# Next steps
next_steps = (
"[bold]Complete Modules 05-07 to unlock TRAINING:[/bold]\n\n"
" [cyan]•[/cyan] Module 05 (Autograd): Calculate gradients automatically\n"
" [cyan]•[/cyan] Module 06 (Optimizers): Update weights intelligently\n"
" [cyan]•[/cyan] Module 07 (Training): Put it all together\n\n"
"[dim]Then return to this SAME perceptron and watch it achieve 95%+!\n"
"You'll see random → intelligent through the power of learning![/dim]"
)
console.print(Panel(next_steps, title="[bold green]🚀 Next Steps[/bold green]", border_style="green"))
console.print()
if __name__ == "__main__":
main()

View File

@@ -1,156 +0,0 @@
#!/usr/bin/env python3
"""
The Perceptron (1957) - Frank Rosenblatt
=======================================
📚 HISTORICAL CONTEXT:
Frank Rosenblatt's Perceptron was the first trainable artificial neural network that
could learn from examples. It sparked the first AI boom and demonstrated that machines
could actually learn to recognize patterns, launching the neural network revolution.
🎯 WHAT YOU'RE BUILDING:
Using YOUR TinyTorch implementations, you'll recreate the exact same perceptron that
started it all - proving that YOU can build the foundation of modern AI from scratch.
✅ REQUIRED MODULES (Run after Module 4):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Module 01 (Tensor) : YOUR data structure with gradient tracking
Module 02 (Activations) : YOUR sigmoid activation for smooth gradients
Module 03 (Layers) : YOUR Linear layer for weight transformations
Data Generation : Directly generated within this script
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏗️ ARCHITECTURE (Original 1957 Design):
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Input │ │ Linear │ │ Sigmoid │ │ Binary │
│ Features │───▶│ YOUR Module │───▶│ YOUR Module │───▶│ Output │
│ (x1, x2) │ │ 03 │ │ 02 │ │ (0 or 1) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
🔍 HOW THE PERCEPTRON LEARNS - A LINEAR DECISION BOUNDARY:
INITIAL (Random Weights): TRAINING (Gradient Descent): CONVERGED (Learned):
4 │ • • • • • 4 │ • • • • • 4 │ • • • • •
│ • • • • • Class 1 │ • • • • • │ • • • • •
2 │ - - - - - ← Wrong! 2 │ • • • • • ← Adjusting 2 │ • • • • • ← Perfect!
│ ○ ○ ○ ○ ○ │ ○ ○ ○ ○ ○ │ ○ ○ ○ ○ ○
0 │ ○ ○ ○ ○ ○ Class 0 0 │ ○ ○ ○ ○ ○ 0 │ ○ ○ ○ ○ ○
└──────────── └──────────── └────────────
0 2 4 0 2 4 0 2 4
Mathematical Operation: Weight Updates:
y = sigmoid(w₁·x₁ + w₂·x₂ + b) w = w - η·∇L (η = learning rate)
Where YOUR modules compute:
- Linear: z = w₁·x₁ + w₂·x₂ + b (weighted sum)
- Sigmoid: y = 1/(1 + e⁻ᶻ) (squash to [0,1])
- Decision: class = 1 if y > 0.5 else 0
🔍 KEY INSIGHTS:
- Single-layer architecture: Just linear transformation + activation
- Linearly separable only: Can't solve XOR problem (that comes later!)
- Foundation for everything: Modern networks are just deeper perceptrons
📊 EXPECTED PERFORMANCE:
- Dataset: 1,000 linearly separable synthetic points
- Training time: 30 seconds
- Expected accuracy: 95%+ (problem is linearly separable)
"""
import sys
import os
import numpy as np
import argparse
# Add project root to path for correct tinytorch imports
# This allows the script to be run from the root of the project
sys.path.insert(0, os.getcwd())
# Import TinyTorch components YOU BUILT!
from tinytorch.core.tensor import Tensor # Module 01: YOU built this!
from tinytorch.core.layers import Linear # Module 03: YOU built this!
from tinytorch.core.activations import Sigmoid # Module 02: YOU built this!
class RosenblattPerceptron:
"""
Rosenblatt's original Perceptron using YOUR TinyTorch implementations!
Historical note: The original used a step function, but we use sigmoid
for smooth gradients (an innovation that came slightly later).
"""
def __init__(self, input_size=2, output_size=1):
print("🧠 Assembling Rosenblatt's Perceptron with YOUR TinyTorch modules...")
# Single layer - just like the original 1957 design!
self.linear = Linear(input_size, output_size) # Module 03: YOUR Linear layer!
self.activation = Sigmoid() # Module 02: YOUR Sigmoid function!
print(f" ✅ Linear layer: {input_size}{output_size} (YOUR Module 03 implementation!)")
print(f" ✅ Activation: Sigmoid (YOUR Module 02 implementation!)")
def forward(self, x):
"""Forward pass through YOUR perceptron implementation."""
# Step 1: Linear transformation using YOUR weights
x = self.linear(x) # Module 03: YOUR Linear.forward() method!
# Step 2: Activation using YOUR sigmoid
x = self.activation(x) # Module 02: YOUR Sigmoid.forward() method!
return x
def main():
"""Demonstrate Rosenblatt's Perceptron using YOUR TinyTorch system!"""
print("🎯 MILESTONE: The Perceptron (1957)")
print(" Historical significance: The first trainable neural network.")
print(" YOUR achievement: Assembling it from YOUR own modules.")
print(" Components used: YOUR Tensor + YOUR Linear + YOUR Sigmoid.")
print("-" * 60)
# Step 1: Prepare synthetic data
print("\n📊 Step 1: Preparing linearly separable data...")
np.random.seed(42)
cluster1 = np.random.normal([2, 2], 0.5, (5, 2)) # Just a few samples are needed
cluster2 = np.random.normal([-2, -2], 0.5, (5, 2))
X = np.vstack([cluster1, cluster2]).astype(np.float32)
print(f" ✅ Data created successfully with shape: {X.shape}")
# Step 2: Create the Perceptron model with YOUR components
print("\n🧠 Step 2: Instantiating the Perceptron model...")
model = RosenblattPerceptron(input_size=2, output_size=1)
print(" ✅ Model assembled successfully!")
# Step 3: Perform a forward pass
print("\n🔬 Step 3: Running a forward pass to test integration...")
# Convert data to YOUR Tensor format
input_tensor = Tensor(X) # Module 01: YOUR Tensor class!
print(f" - Input tensor created with shape: {input_tensor.shape}")
# Run the forward pass through YOUR implementations
output_tensor = model.forward(input_tensor)
print(f" - Output tensor received with shape: {output_tensor.shape}")
# --- Verification ---
print("\n" + "="*60)
print("✅ SUCCESS! Your components integrated perfectly.")
print(" You have successfully assembled the architecture of the first")
print(" trainable neural network using the modules YOU built.")
print("="*60)
print("\n🎓 What YOU Accomplished:")
print(" • YOU assembled a neural network from scratch.")
print(" • YOUR Tensor class handled the data flow.")
print(" • YOUR Linear layer performed the mathematical transformation.")
print(" • YOUR Sigmoid activation processed the layer's output.")
print("\n🚀 Next Steps:")
print(" • In future modules, you will build the components needed to TRAIN this model:")
print(" - Module 04 (Losses): To measure how wrong the model's predictions are.")
print(" - Module 05 (Autograd): To calculate the gradients needed to improve.")
print(" - Module 06 (Optimizers): To update the model's weights automatically.")
print("\n For now, congratulations on this major milestone!")
if __name__ == "__main__":
main()

View File

@@ -125,11 +125,23 @@ def log_softmax(x: Tensor, dim=-1) -> Tensor # Numerical stability
---
## 🪜 **Milestone 1: Perceptron (After Module 04)**
**Location:** `milestones/01_perceptron/`
**Deliverable:** Train Linear + Sigmoid on 2D dataset, visualize decision boundary
**Success Criteria:** 95% accuracy on linearly separable data
**Unlock:** Complete modules 01-04 + integration test
## 🪜 **Milestone 1: Perceptron 1957 (After Modules 04 & 07)**
**Location:** `milestones/01_perceptron_1957/`
**Part 1: Forward Pass (After Module 04)**
- File: `forward_pass_interactive.py`
- Build perceptron with random weights
- Interactive CLI to manually tweak weights (frustration!)
- Success: ~40-60% accuracy (essentially random)
- Lesson: "I need automatic training!"
**Part 2: Trained (After Module 07)**
- File: `perceptron_trained.py`
- Same architecture, NOW with backprop training
- Success: 95%+ accuracy on linearly separable data
- Lesson: "Training transforms random → intelligent!"
**Unlock:** Complete modules 01-04 for Part 1, modules 05-07 for Part 2
---
@@ -214,10 +226,29 @@ def clip_grad_norm(parameters, max_norm)
---
## 🪜 **Milestone 2: MLP (After Module 07)**
**Location:** `milestones/02_mlp/`
**Deliverable:** 2-layer MLP on MNIST, compare to perceptron
**Success Criteria:** >95% accuracy on MNIST
## 🪜 **Milestone 2: XOR Crisis 1969 (After Module 07)**
**Location:** `milestones/02_xor_crisis_1969/`
**File:** `perceptron_xor_fails.py`
**Deliverable:**
- Try training perceptron on XOR problem (4 points!)
- Train for 1000+ epochs... stuck at ~50%
- Visualize why: XOR is NOT linearly separable
- Show decision boundary can't separate the points
**What Students Learn:**
- Training works (we proved it in M1)
- But architecture has fundamental limitations
- Single layer = can only learn linear decision boundaries
- Historical context: Minsky's 1969 proof killed AI research for a decade
**Success Criteria:**
- Perceptron trains but never exceeds 60% on XOR
- Visualization clearly shows the limitation
- Student understands WHY it fails (not linearly separable)
**Emotional Beat:** "Wait... training doesn't solve everything?"
**Unlock:** Complete modules 05-07 + integration test
---
@@ -276,11 +307,30 @@ class BatchNorm2d:
---
## 🪜 **Milestone 3: CNN (After Module 09)**
**Location:** `milestones/03_cnn/`
**Deliverable:** 3-layer CNN on CIFAR-10, visualize filters
**Success Criteria:** >75% accuracy on CIFAR-10
**Unlock:** Complete modules 08-09 + integration test
## 🪜 **Milestone 3: MLP Revival 1986 (After Module 07)**
**Location:** `milestones/03_mlp_revival_1986/`
**Files:** `mlp_xor.py`, `mlp_mnist.py`
**Deliverable:**
- Add ONE hidden layer to solve XOR → 100% accuracy!
- Train MLP on MNIST → 95%+ accuracy
- Compare to perceptron failure: depth changes everything
- Visualize curved decision boundary for XOR
**What Students Learn:**
- Hidden layers enable non-linear decision boundaries
- Backpropagation + depth = AI renaissance
- Same training algorithm (backprop) works for any depth
- Historical context: Rumelhart's 1986 paper revived the field
**Success Criteria:**
- MLP solves XOR: 100% accuracy
- MLP on MNIST: >95% accuracy
- Student understands power of depth
**Emotional Beat:** "ONE hidden layer changes everything!"
**Unlock:** Complete modules 05-07 + integration test
---
@@ -400,11 +450,30 @@ def attention_with_cache(Q, K, V, cache, layer_idx, seq_pos) -> Tensor
---
## 🪜 **Milestone 4: TinyGPT (After Module 14)**
**Location:** `milestones/04_tinygpt/`
**Deliverable:** Character-level GPT on Shakespeare, generate text
**Success Criteria:** Perplexity < 2.0, coherent generation
**Unlock:** Complete modules 10-14 + integration test
## 🪜 **Milestone 4: CNN Revolution 1998 (After Module 09)**
**Location:** `milestones/04_cnn_revolution_1998/`
**File:** `lecun_cifar10.py`
**Deliverable:**
- Build LeNet-style CNN for CIFAR-10
- Convolutional layers exploit spatial structure
- Visualize learned filters (edge detectors, etc.)
- Compare to MLP: fewer parameters, better accuracy
**What Students Learn:**
- Spatial inductive bias matters for vision
- Convolutions share weights across space
- Pooling provides translation invariance
- Historical context: LeCun's CNN revolutionized computer vision
**Success Criteria:**
- CNN on CIFAR-10: >75% accuracy
- Visualizations show meaningful filters
- Student understands spatial structure
**Emotional Beat:** "It SEES patterns in images!"
**Unlock:** Complete modules 08-09 + integration test
---
@@ -519,13 +588,56 @@ def plot_pareto_frontier(results: pd.DataFrame)
---
## 🪜 **Milestone 5: Systems Capstone (After Module 19)**
**Location:** `milestones/05_systems_capstone/`
**Deliverable:** Profile and optimize CNN vs TinyGPT
- Apply quantization and pruning
- Generate comparison report
- Show accuracy vs speed trade-offs
**Success Criteria:** 2× speedup with <5% accuracy loss
## 🪜 **Milestone 5: Transformer Era 2017 (After Module 14)**
**Location:** `milestones/05_transformer_era_2017/`
**File:** `vaswani_shakespeare.py`
**Deliverable:**
- Build character-level GPT on Shakespeare corpus
- Self-attention captures long-range dependencies
- Generate coherent text samples
- Compare to RNN: attention > recurrence
**What Students Learn:**
- Attention mechanism enables parallelization
- Positional encoding for sequence order
- Autoregressive generation with KV caching
- Historical context: "Attention is all you need" changed NLP forever
**Success Criteria:**
- Perplexity < 2.0 on Shakespeare
- Generated text is coherent (subjective)
- Student understands attention mechanism
**Emotional Beat:** "It writes like Shakespeare!"
**Unlock:** Complete modules 10-14 + integration test
---
## 🪜 **Milestone 6: Systems Age 2024 (After Module 19)**
**Location:** `milestones/06_systems_age_2024/`
**File:** `optimize_models.py`
**Deliverable:**
- Profile CNN (M4) and GPT (M5) for bottlenecks
- Apply quantization (INT8) and pruning (50% sparsity)
- Benchmark before/after optimization
- Generate performance comparison report
**What Students Learn:**
- Profiling reveals true bottlenecks
- Quantization: 4× memory reduction, minimal accuracy loss
- Pruning: Structured vs unstructured sparsity
- Modern ML is systems engineering
**Success Criteria:**
- 2× speedup with <5% accuracy loss
- Comprehensive benchmark report
- Student understands systems trade-offs
**Emotional Beat:** "I made production AI!"
**Unlock:** Complete modules 15-19 + integration test
---
@@ -569,11 +681,11 @@ def plot_pareto_frontier(results: pd.DataFrame)
## 🚀 Implementation Order
1. **Phase 1:** Modules 01-04 → Milestone 1 (Perceptron)
2. **Phase 2:** Modules 05-07 → Milestone 2 (MLP)
3. **Phase 3:** Modules 08-09 → Milestone 3 (CNN)
4. **Phase 4:** Modules 10-14 → Milestone 4 (TinyGPT)
5. **Phase 5:** Modules 15-19 → Milestone 5 (Systems)
1. **Phase 1:** Modules 01-04 → Milestone 1 Part 1 (Perceptron forward pass)
2. **Phase 2:** Modules 05-07 → Milestones 1 Part 2, 2, 3 (Training, Crisis, Revival)
3. **Phase 3:** Modules 08-09 → Milestone 4 (CNN)
4. **Phase 4:** Modules 10-14 → Milestone 5 (Transformers)
5. **Phase 5:** Modules 15-19 → Milestone 6 (Systems)
---