diff --git a/milestones/perceptron_1957/README.md b/milestones/01_perceptron_1957/README.md similarity index 100% rename from milestones/perceptron_1957/README.md rename to milestones/01_perceptron_1957/README.md diff --git a/milestones/01_perceptron_1957/forward_pass.py b/milestones/01_perceptron_1957/forward_pass.py new file mode 100644 index 00000000..5c67895c --- /dev/null +++ b/milestones/01_perceptron_1957/forward_pass.py @@ -0,0 +1,418 @@ +#!/usr/bin/env python3 +""" +The Perceptron (1957) - Frank Rosenblatt [FORWARD PASS ONLY] +============================================================= + +πŸ“š HISTORICAL CONTEXT: +Frank Rosenblatt's Perceptron was the first trainable artificial neural network that +could learn from examples. It sparked the first AI boom and demonstrated that machines +could actually learn to recognize patterns, launching the neural network revolution. + +🎯 MILESTONE 1: FORWARD PASS (BEFORE TRAINING) +Using YOUR TinyTorch implementations, you'll build a perceptron with RANDOM weights. +This milestone shows you WHY training is essential - the model won't work without it! + +⚠️ IMPORTANT: This is NOT the trained version! +- You've completed Modules 01-04 (Tensor, Activations, Layers, Losses) +- You HAVEN'T learned training yet (Modules 05-07: Autograd, Optimizers, Training) +- This milestone demonstrates the PROBLEM that training will solve + +βœ… REQUIRED MODULES (Run after Module 04): +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + Module 01 (Tensor) : YOUR data structure (gradients dormant for now) + Module 02 (Activations) : YOUR sigmoid activation function + Module 03 (Layers) : YOUR Linear layer with RANDOM weights + Module 04 (Losses) : YOUR loss functions (for measuring failure) + Data Generation : Directly generated within this script +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +πŸ—οΈ ARCHITECTURE (Original 1957 Design): + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Input β”‚ β”‚ Linear β”‚ β”‚ Sigmoid β”‚ β”‚ Binary β”‚ + β”‚ Features │───▢│ YOUR Module │───▢│ YOUR Module │───▢│ Output β”‚ + β”‚ (x1, x2) β”‚ β”‚ 03 β”‚ β”‚ 02 β”‚ β”‚ (0 or 1) β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + +πŸ” WHAT YOU'LL SEE - EXPECTATION vs REALITY: + + WHAT YOU MIGHT EXPECT: WHAT YOU'LL ACTUALLY GET: + "I built it, so it works!" "Wait... it's just guessing!" + + 4 β”‚ β€’ β€’ β€’ β€’ β€’ 4 β”‚ β€’ β—‹ β€’ β—‹ β€’ + β”‚ β€’ β€’ β€’ β€’ β€’ β•± Perfect! β”‚ β—‹ β€’ β€’ β—‹ β€’ β•² Random! + 2 β”‚ β€’ β€’ β€’ β€’ β•± β€’ 2 β”‚ β€’ β—‹ β€’ β€’ β—‹ β€’ + β”‚ β—‹ β—‹ β—‹ β•± β—‹ β—‹ β”‚ β—‹ β€’ β—‹ β—‹ β€’ β—‹ + 0 β”‚ β—‹ β—‹ β•± β—‹ β—‹ β—‹ 0 β”‚ β€’ β—‹ β€’ β—‹ β—‹ β€’ + └──────────── └──────────── + 0 2 4 0 2 4 + + ❌ Accuracy: ~50% ❌ Accuracy: ~50% + (What you hoped for) (What random weights give you) + + WHY IS IT SO BAD? + The weights are RANDOM! Without training: + - w₁, wβ‚‚, b are random numbers from initialization + - The decision boundary is in a random position + - Predictions are essentially coin flips + + Mathematical Reality: + y = sigmoid(w₁·x₁ + wβ‚‚Β·xβ‚‚ + b) ← These are RANDOM values! + + Where YOUR modules compute: + - Linear: z = w₁·x₁ + wβ‚‚Β·xβ‚‚ + b (random w₁, wβ‚‚, b!) + - Sigmoid: y = 1/(1 + e⁻ᢻ) (squash to [0,1]) + - Decision: class = 1 if y > 0.5 else 0 (random decision boundary!) + +πŸ” KEY INSIGHTS (This Milestone): +- βœ… Architecture works: Forward pass executes correctly +- ❌ But it's useless: Random weights = random predictions (~50% accuracy) +- πŸ’‘ The lesson: Building the model is easy; making it LEARN is the hard part +- 🎯 Motivation: You NEED training (coming in Modules 05-07!) + +πŸ“Š WHAT TO EXPECT (This Milestone): +- Dataset: 10 linearly separable synthetic points (just for testing) +- No training: Just forward pass with random weights +- Expected accuracy: ~40-60% (essentially random guessing) +- Key takeaway: "My model doesn't work... yet!" + +πŸš€ WHAT COMES NEXT (After Module 07): +- Same architecture, but WITH training +- Expected accuracy: 95%+ on same problem +- Training time: ~30 seconds +- You'll see the SAME perceptron transform from useless β†’ intelligent +""" + +import sys +import os +import numpy as np +import argparse + +# Add project root to path for correct tinytorch imports +# This allows the script to be run from the root of the project +sys.path.insert(0, os.getcwd()) + +# Import TinyTorch components YOU BUILT! +from tinytorch.core.tensor import Tensor # Module 01: YOU built this! +from tinytorch.core.layers import Linear # Module 03: YOU built this! +from tinytorch.core.activations import Sigmoid # Module 02: YOU built this! + +# Import Rich for beautiful CLI output +from rich.console import Console +from rich.table import Table +from rich.panel import Panel +from rich import box +from rich.text import Text + +console = Console() + +# ============================================================================ +# πŸŽ“ STUDENT CODE: This is what YOU built with Modules 01-03! +# ============================================================================ + +class Perceptron: + """ + Simple perceptron: Linear + Sigmoid + + This uses components YOU built in: + - Module 01: Tensor (data structure) + - Module 02: Sigmoid (activation function) + - Module 03: Linear (layer with weights) + + The entire model is just ~10 lines of code! + """ + + def __init__(self, input_size=2, output_size=1): + # Module 03: Linear layer (w1*x1 + w2*x2 + b) + self.linear = Linear(input_size, output_size) + + # Module 02: Sigmoid activation (squashes to [0,1]) + self.activation = Sigmoid() + + def forward(self, x): + # Step 1: Linear transformation (Module 03) + x = self.linear(x) + + # Step 2: Activation function (Module 02) + x = self.activation(x) + + return x + + def __call__(self, x): + """PyTorch-style: model(x) calls forward(x)""" + return self.forward(x) + +# ============================================================================ +# πŸ“Š VISUALIZATION CODE: Rich CLI formatting (you can ignore this!) +# ============================================================================ + +def draw_network_architecture(): + """Draw the perceptron architecture using ASCII art.""" + network = """ + Input Layer Linear Layer Activation Output + + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ + β”‚ x₁ │──────── β”‚ β”‚ β”‚ β”‚ β”‚ + β”‚ β”‚ w₁ β”‚ β”‚ z β”‚ β”‚ y β”‚ class β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Linear │─────────│ Sigmoid │──────│ (0 or 1)β”‚ + β”‚ (Wx + b) β”‚ β”‚ Οƒ(z) β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ + β”‚ β”‚ wβ‚‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ + β”‚ xβ‚‚ │──────── β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + ↑ + b (bias) + + Computation Flow: + 1. Linear: z = w₁·x₁ + wβ‚‚Β·xβ‚‚ + b + 2. Sigmoid: y = 1 / (1 + e⁻ᢻ) + 3. Decision: class = 1 if y > 0.5 else 0 + """ + return network.strip() + +def visualize_data_points(X, y, predictions=None, weights=None): + """Create ASCII visualization of data points with decision boundary.""" + # Create a simple scatter plot + grid_size = 20 + grid = [[' ' for _ in range(grid_size)] for _ in range(grid_size)] + + # Find bounds + x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5 + y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5 + + # Draw decision boundary if weights provided + # Decision boundary: w1*x1 + w2*x2 + b = 0 β†’ x2 = -(w1*x1 + b)/w2 + if weights is not None: + w1, w2, b = weights + if abs(w2) > 0.001: # Avoid division by zero + # Determine slope for choosing line character + slope = -w1 / w2 + line_char = '/' if slope > 0 else '\\' + + for gx in range(grid_size): + # Map grid x to real x + px = x_min + (gx / (grid_size - 1)) * (x_max - x_min) + # Calculate decision boundary y + py = -(w1 * px + b) / w2 + # Map to grid y + gy = int((py - y_min) / (y_max - y_min) * (grid_size - 1)) + gy = grid_size - 1 - gy # Flip y-axis + + if 0 <= gy < grid_size and grid[gy][gx] == ' ': + grid[gy][gx] = line_char # Decision boundary line + + # Plot points (after boundary so they overlap) + for i, (px, py) in enumerate(X): + # Map to grid + gx = int((px - x_min) / (x_max - x_min) * (grid_size - 1)) + gy = int((py - y_min) / (y_max - y_min) * (grid_size - 1)) + gy = grid_size - 1 - gy # Flip y-axis + + if 0 <= gx < grid_size and 0 <= gy < grid_size: + true_label = int(y[i]) + if predictions is not None: + pred_label = int(predictions[i]) + # Show correct (green) vs incorrect (red) predictions + if true_label == pred_label: + grid[gy][gx] = '●' if true_label == 1 else 'β—‹' + else: + grid[gy][gx] = 'βœ—' # Wrong prediction + else: + grid[gy][gx] = '●' if true_label == 1 else 'β—‹' + + # Build the plot + lines = [] + lines.append(" " + "─" * grid_size) + for row in grid: + lines.append(" β”‚" + "".join(row) + "β”‚") + lines.append(" " + "─" * grid_size) + lines.append(" ● = Class 1 (should cluster top-right)") + lines.append(" β—‹ = Class 0 (should cluster bottom-left)") + if weights is not None: + lines.append(" / or \\ = Decision boundary (where z = 0)") + if predictions is not None: + lines.append(" βœ— = Incorrect prediction") + + return "\n".join(lines) + + +def main(): + """Demonstrate Rosenblatt's Perceptron using YOUR TinyTorch system!""" + + # Header + console.print() + console.print(Panel.fit( + "[bold cyan]🎯 MILESTONE 1: The Perceptron (1957)[/bold cyan]\n" + "[yellow]⚠️ FORWARD PASS ONLY - Random Weights[/yellow]\n\n" + "[dim]Components: YOUR Tensor + YOUR Linear + YOUR Sigmoid[/dim]", + border_style="cyan" + )) + console.print() + + # Introduction - What to expect + intro = ( + "[bold]What You're Demonstrating:[/bold]\n\n" + "You've completed Modules 01-03 and built these components:\n" + " β€’ [cyan]Module 01:[/cyan] Tensor (data structure)\n" + " β€’ [cyan]Module 02:[/cyan] Sigmoid (activation function)\n" + " β€’ [cyan]Module 03:[/cyan] Linear (layer with weights)\n\n" + "[bold yellow]What to Expect:[/bold yellow]\n" + " β€’ The architecture [green]WORKS[/green] - forward pass succeeds βœ“\n" + " β€’ Accuracy is [red]POOR[/red] - random weights = random predictions βœ—\n" + " β€’ Decision boundary (/) is in a [yellow]RANDOM[/yellow] position\n" + " β€’ Each run gives [yellow]DIFFERENT[/yellow] results (no seed!)\n\n" + "[bold cyan]The Key Lesson:[/bold cyan]\n" + " Building the model is easy. Making it [bold]LEARN[/bold] is hard.\n" + " That's why you need Modules 04-07: Losses, Autograd, Optimizers, Training!" + ) + console.print(Panel(intro, title="[bold cyan]πŸ“– Introduction[/bold cyan]", border_style="cyan")) + console.print() + + # Step 1: Prepare synthetic data + console.print("[bold]πŸ“Š Step 1: Preparing Data[/bold]") + console.print(" Creating linearly separable clusters...") + console.print(" [dim]This is a SIMPLE problem - a trained model achieves 95%+ easily[/dim]") + console.print(" [yellow]⚠️ No random seed - each run will be different![/yellow]") + + cluster1 = np.random.normal([2, 2], 0.5, (5, 2)) # Class 1: top-right + cluster2 = np.random.normal([-2, -2], 0.5, (5, 2)) # Class 0: bottom-left + X = np.vstack([cluster1, cluster2]).astype(np.float32) + y = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0], dtype=np.float32) # True labels + + # Show data visualization + console.print() + data_viz = visualize_data_points(X, y) + console.print(Panel(data_viz, title="[cyan]Training Data[/cyan]", border_style="cyan")) + console.print(f" [green]βœ“[/green] Created {X.shape[0]} points in 2 clearly separated clusters\n") + + # Step 2: Create the Perceptron model with YOUR components + console.print("[bold]🧠 Step 2: Building Model[/bold]") + console.print(" [yellow]⚠️ No training yet - you haven't learned Modules 05-07![/yellow]") + console.print(" 🧠 Assembling perceptron with YOUR TinyTorch modules...") + + model = Perceptron(input_size=2, output_size=1) + + console.print(f" [green]βœ“[/green] Linear layer: 2 β†’ 1 [dim](YOUR Module 03!)[/dim]") + console.print(f" [green]βœ“[/green] Activation: Sigmoid [dim](YOUR Module 02!)[/dim]") + console.print(" [yellow]⚠️ Model assembled - but weights are RANDOM![/yellow]\n") + + # Show network architecture + network_diagram = draw_network_architecture() + console.print(Panel(network_diagram, title="[cyan]πŸ—οΈ Network Architecture (1957 Design)[/cyan]", border_style="cyan")) + console.print() + + # Step 3: Test with random weights + console.print("[bold]πŸ”¬ Step 3: Testing with Random Weights[/bold]") + console.print(" Running forward pass...\n") + + input_tensor = Tensor(X) + predictions = model(input_tensor) + + # Convert to binary predictions + pred_classes = (predictions.data > 0.5).astype(int).flatten() + accuracy = (pred_classes == y).mean() + + # Format arrays nicely for display + true_str = ' '.join([f"{int(val)}" for val in y]) + pred_str = ' '.join([f"{val}" for val in pred_classes]) + match_str = ' '.join(['[green]βœ“[/green]' if m else '[red]βœ—[/red]' for m in (pred_classes == y)]) + + # Create results table + results_table = Table(title="πŸ“Š Prediction Results", box=box.ROUNDED, border_style="cyan") + results_table.add_column("Metric", style="cyan", no_wrap=True) + results_table.add_column("Value", style="white") + + results_table.add_row("True Labels", f"[{true_str}]") + results_table.add_row("Predictions", f"[{pred_str}]") + results_table.add_row("Matches", match_str) + + # Determine status + if accuracy < 0.6: + accuracy_display = f"[red]{accuracy:.1%} ❌ Random Guessing![/red]" + status = "FAILED" + status_color = "red" + else: + accuracy_display = f"[yellow]{accuracy:.1%} 🎲 Got Lucky![/yellow]" + status = "LUCKY" + status_color = "yellow" + + results_table.add_row("Accuracy", accuracy_display) + console.print(results_table) + console.print() + + # Extract weights for visualization and display + w1 = model.linear.weight.data[0,0] + w2 = model.linear.weight.data[1,0] + b = model.linear.bias.data[0] + + # Calculate z values (linear output before sigmoid) + z_values = X @ np.array([[w1], [w2]]) + b + + # Show visualization with predictions AND decision boundary + pred_viz = visualize_data_points(X, y, pred_classes, weights=(w1, w2, b)) + console.print(Panel(pred_viz, title="[cyan]Predictions with Decision Boundary[/cyan]", border_style=status_color)) + console.print() + + # Show weights AND equation + decision_eq = f"z = {w1:.4f}Β·x₁ + {w2:.4f}Β·xβ‚‚ + {b:.4f}" + boundary_eq = f"Decision boundary (z=0): xβ‚‚ = {-w1/w2:.4f}Β·x₁ + {-b/w2:.4f}" if abs(w2) > 0.001 else "Decision boundary: vertical line" + + weights_content = ( + f"[bold]Random Weights:[/bold]\n" + f" w₁ = [yellow]{w1:7.4f}[/yellow]\n" + f" wβ‚‚ = [yellow]{w2:7.4f}[/yellow]\n" + f" b = [yellow]{b:7.4f}[/yellow]\n\n" + f"[bold]Linear Function:[/bold]\n" + f" {decision_eq}\n\n" + f"[bold]Decision Line:[/bold]\n" + f" {boundary_eq}\n" + f" [dim](Everything above line β†’ Class 1, below β†’ Class 0)[/dim]" + ) + console.print(Panel(weights_content, title="[yellow]πŸ”§ Model Parameters[/yellow]", border_style="yellow")) + console.print() + + # Diagnosis + if status == "FAILED": + diagnosis = ( + "[red]❌ The model is essentially guessing randomly[/red]\n" + "[red]❌ Random initialization = random decision boundary[/red]\n\n" + "[bold cyan]πŸ’‘ KEY INSIGHT:[/bold cyan] Building the architecture is easy.\n" + " Making it [bold]LEARN[/bold] is the hard part!" + ) + else: + diagnosis = ( + "[yellow]🎲 You got lucky with this random initialization![/yellow]\n" + "[yellow]🎲 But this is NOT learning - just chance[/yellow]\n\n" + "[bold cyan]πŸ’‘ KEY INSIGHT:[/bold cyan] Even when it works, random weights\n" + " won't generalize. You need [bold]TRAINING[/bold]!" + ) + + console.print(Panel(diagnosis, title=f"[{status_color}]πŸ” Diagnosis: {status}[/{status_color}]", border_style=status_color)) + + # Tip for multiple runs + tip = ( + "πŸ’‘ [bold yellow]Run this script multiple times![/bold yellow]\n\n" + "Each run uses different random weights and data.\n" + "You'll see varying results:\n" + " β€’ Sometimes: High accuracy (got lucky!) 🎲\n" + " β€’ Usually: Low accuracy (random guessing) ❌\n\n" + "[dim]This demonstrates why training is essential - it must work EVERY time![/dim]" + ) + console.print(Panel(tip, title="[bold yellow]πŸ’‘ Experiment[/bold yellow]", border_style="yellow")) + console.print() + + # Next steps + next_steps = ( + "[bold]Complete Modules 05-07 to unlock TRAINING:[/bold]\n\n" + " [cyan]β€’[/cyan] Module 05 (Autograd): Calculate gradients automatically\n" + " [cyan]β€’[/cyan] Module 06 (Optimizers): Update weights intelligently\n" + " [cyan]β€’[/cyan] Module 07 (Training): Put it all together\n\n" + "[dim]Then return to this SAME perceptron and watch it achieve 95%+!\n" + "You'll see random β†’ intelligent through the power of learning![/dim]" + ) + console.print(Panel(next_steps, title="[bold green]πŸš€ Next Steps[/bold green]", border_style="green")) + console.print() + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/milestones/01_perceptron_1957/perceptron_trained.py b/milestones/01_perceptron_1957/perceptron_trained.py new file mode 100644 index 00000000..e69de29b diff --git a/milestones/xor_1969/README.md b/milestones/02_xor_crisis_1969/README.md similarity index 100% rename from milestones/xor_1969/README.md rename to milestones/02_xor_crisis_1969/README.md diff --git a/milestones/xor_1969/minsky_xor_problem.py b/milestones/02_xor_crisis_1969/perceptron_xor_fails.py similarity index 100% rename from milestones/xor_1969/minsky_xor_problem.py rename to milestones/02_xor_crisis_1969/perceptron_xor_fails.py diff --git a/milestones/mnist_mlp_1986/README.md b/milestones/03_mlp_revival_1986/README.md similarity index 100% rename from milestones/mnist_mlp_1986/README.md rename to milestones/03_mlp_revival_1986/README.md diff --git a/milestones/mnist_mlp_1986/UPDATE_SUMMARY.md b/milestones/03_mlp_revival_1986/UPDATE_SUMMARY.md similarity index 100% rename from milestones/mnist_mlp_1986/UPDATE_SUMMARY.md rename to milestones/03_mlp_revival_1986/UPDATE_SUMMARY.md diff --git a/milestones/mnist_mlp_1986/train_mlp.py b/milestones/03_mlp_revival_1986/mlp_mnist.py similarity index 100% rename from milestones/mnist_mlp_1986/train_mlp.py rename to milestones/03_mlp_revival_1986/mlp_mnist.py diff --git a/milestones/03_mlp_revival_1986/mlp_xor.py b/milestones/03_mlp_revival_1986/mlp_xor.py new file mode 100644 index 00000000..e69de29b diff --git a/milestones/cifar_cnn_modern/README.md b/milestones/04_cnn_revolution_1998/README.md similarity index 100% rename from milestones/cifar_cnn_modern/README.md rename to milestones/04_cnn_revolution_1998/README.md diff --git a/milestones/cifar_cnn_modern/train_cnn.py b/milestones/04_cnn_revolution_1998/lecun_cifar10.py similarity index 100% rename from milestones/cifar_cnn_modern/train_cnn.py rename to milestones/04_cnn_revolution_1998/lecun_cifar10.py diff --git a/milestones/gpt_2018/README.md b/milestones/05_transformer_era_2017/README.md similarity index 100% rename from milestones/gpt_2018/README.md rename to milestones/05_transformer_era_2017/README.md diff --git a/milestones/gpt_2018/train_gpt.py b/milestones/05_transformer_era_2017/vaswani_shakespeare.py similarity index 100% rename from milestones/gpt_2018/train_gpt.py rename to milestones/05_transformer_era_2017/vaswani_shakespeare.py diff --git a/milestones/06_systems_age_2024/optimize_models.py b/milestones/06_systems_age_2024/optimize_models.py new file mode 100644 index 00000000..e69de29b diff --git a/milestones/perceptron_1957/rosenblatt_perceptron.py b/milestones/perceptron_1957/rosenblatt_perceptron.py deleted file mode 100644 index acfa711c..00000000 --- a/milestones/perceptron_1957/rosenblatt_perceptron.py +++ /dev/null @@ -1,156 +0,0 @@ -#!/usr/bin/env python3 -""" -The Perceptron (1957) - Frank Rosenblatt -======================================= - -πŸ“š HISTORICAL CONTEXT: -Frank Rosenblatt's Perceptron was the first trainable artificial neural network that -could learn from examples. It sparked the first AI boom and demonstrated that machines -could actually learn to recognize patterns, launching the neural network revolution. - -🎯 WHAT YOU'RE BUILDING: -Using YOUR TinyTorch implementations, you'll recreate the exact same perceptron that -started it all - proving that YOU can build the foundation of modern AI from scratch. - -βœ… REQUIRED MODULES (Run after Module 4): -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - Module 01 (Tensor) : YOUR data structure with gradient tracking - Module 02 (Activations) : YOUR sigmoid activation for smooth gradients - Module 03 (Layers) : YOUR Linear layer for weight transformations - Data Generation : Directly generated within this script -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - -πŸ—οΈ ARCHITECTURE (Original 1957 Design): - β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ Input β”‚ β”‚ Linear β”‚ β”‚ Sigmoid β”‚ β”‚ Binary β”‚ - β”‚ Features │───▢│ YOUR Module │───▢│ YOUR Module │───▢│ Output β”‚ - β”‚ (x1, x2) β”‚ β”‚ 03 β”‚ β”‚ 02 β”‚ β”‚ (0 or 1) β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - -πŸ” HOW THE PERCEPTRON LEARNS - A LINEAR DECISION BOUNDARY: - - INITIAL (Random Weights): TRAINING (Gradient Descent): CONVERGED (Learned): - - 4 β”‚ β€’ β€’ β€’ β€’ β€’ 4 β”‚ β€’ β€’ β€’ β€’ β€’ 4 β”‚ β€’ β€’ β€’ β€’ β€’ - β”‚ β€’ β€’ β€’ β€’ β€’ Class 1 β”‚ β€’ β€’ β€’ β€’ β€’ β•± β”‚ β€’ β€’ β€’ β€’ β€’ β•± - 2 β”‚ - - - - - ← Wrong! 2 β”‚ β€’ β€’ β€’ β€’ β•± β€’ ← Adjusting 2 β”‚ β€’ β€’ β€’ β€’ β•± β€’ ← Perfect! - β”‚ β—‹ β—‹ β—‹ β—‹ β—‹ β”‚ β—‹ β—‹ β—‹ β•± β—‹ β—‹ β”‚ β—‹ β—‹ β—‹ β•± β—‹ β—‹ - 0 β”‚ β—‹ β—‹ β—‹ β—‹ β—‹ Class 0 0 β”‚ β—‹ β—‹ β•± β—‹ β—‹ β—‹ 0 β”‚ β—‹ β—‹ β•± β—‹ β—‹ β—‹ - └──────────── └──────────── └──────────── - 0 2 4 0 2 4 0 2 4 - - Mathematical Operation: Weight Updates: - y = sigmoid(w₁·x₁ + wβ‚‚Β·xβ‚‚ + b) w = w - Ξ·Β·βˆ‡L (Ξ· = learning rate) - - Where YOUR modules compute: - - Linear: z = w₁·x₁ + wβ‚‚Β·xβ‚‚ + b (weighted sum) - - Sigmoid: y = 1/(1 + e⁻ᢻ) (squash to [0,1]) - - Decision: class = 1 if y > 0.5 else 0 - -πŸ” KEY INSIGHTS: -- Single-layer architecture: Just linear transformation + activation -- Linearly separable only: Can't solve XOR problem (that comes later!) -- Foundation for everything: Modern networks are just deeper perceptrons - -πŸ“Š EXPECTED PERFORMANCE: -- Dataset: 1,000 linearly separable synthetic points -- Training time: 30 seconds -- Expected accuracy: 95%+ (problem is linearly separable) -""" - -import sys -import os -import numpy as np -import argparse - -# Add project root to path for correct tinytorch imports -# This allows the script to be run from the root of the project -sys.path.insert(0, os.getcwd()) - -# Import TinyTorch components YOU BUILT! -from tinytorch.core.tensor import Tensor # Module 01: YOU built this! -from tinytorch.core.layers import Linear # Module 03: YOU built this! -from tinytorch.core.activations import Sigmoid # Module 02: YOU built this! - -class RosenblattPerceptron: - """ - Rosenblatt's original Perceptron using YOUR TinyTorch implementations! - - Historical note: The original used a step function, but we use sigmoid - for smooth gradients (an innovation that came slightly later). - """ - - def __init__(self, input_size=2, output_size=1): - print("🧠 Assembling Rosenblatt's Perceptron with YOUR TinyTorch modules...") - - # Single layer - just like the original 1957 design! - self.linear = Linear(input_size, output_size) # Module 03: YOUR Linear layer! - self.activation = Sigmoid() # Module 02: YOUR Sigmoid function! - - print(f" βœ… Linear layer: {input_size} β†’ {output_size} (YOUR Module 03 implementation!)") - print(f" βœ… Activation: Sigmoid (YOUR Module 02 implementation!)") - - def forward(self, x): - """Forward pass through YOUR perceptron implementation.""" - # Step 1: Linear transformation using YOUR weights - x = self.linear(x) # Module 03: YOUR Linear.forward() method! - - # Step 2: Activation using YOUR sigmoid - x = self.activation(x) # Module 02: YOUR Sigmoid.forward() method! - - return x - -def main(): - """Demonstrate Rosenblatt's Perceptron using YOUR TinyTorch system!""" - - print("🎯 MILESTONE: The Perceptron (1957)") - print(" Historical significance: The first trainable neural network.") - print(" YOUR achievement: Assembling it from YOUR own modules.") - print(" Components used: YOUR Tensor + YOUR Linear + YOUR Sigmoid.") - print("-" * 60) - - # Step 1: Prepare synthetic data - print("\nπŸ“Š Step 1: Preparing linearly separable data...") - np.random.seed(42) - cluster1 = np.random.normal([2, 2], 0.5, (5, 2)) # Just a few samples are needed - cluster2 = np.random.normal([-2, -2], 0.5, (5, 2)) - X = np.vstack([cluster1, cluster2]).astype(np.float32) - print(f" βœ… Data created successfully with shape: {X.shape}") - - # Step 2: Create the Perceptron model with YOUR components - print("\n🧠 Step 2: Instantiating the Perceptron model...") - model = RosenblattPerceptron(input_size=2, output_size=1) - print(" βœ… Model assembled successfully!") - - # Step 3: Perform a forward pass - print("\nπŸ”¬ Step 3: Running a forward pass to test integration...") - # Convert data to YOUR Tensor format - input_tensor = Tensor(X) # Module 01: YOUR Tensor class! - print(f" - Input tensor created with shape: {input_tensor.shape}") - - # Run the forward pass through YOUR implementations - output_tensor = model.forward(input_tensor) - print(f" - Output tensor received with shape: {output_tensor.shape}") - - # --- Verification --- - print("\n" + "="*60) - print("βœ… SUCCESS! Your components integrated perfectly.") - print(" You have successfully assembled the architecture of the first") - print(" trainable neural network using the modules YOU built.") - print("="*60) - - print("\nπŸŽ“ What YOU Accomplished:") - print(" β€’ YOU assembled a neural network from scratch.") - print(" β€’ YOUR Tensor class handled the data flow.") - print(" β€’ YOUR Linear layer performed the mathematical transformation.") - print(" β€’ YOUR Sigmoid activation processed the layer's output.") - - print("\nπŸš€ Next Steps:") - print(" β€’ In future modules, you will build the components needed to TRAIN this model:") - print(" - Module 04 (Losses): To measure how wrong the model's predictions are.") - print(" - Module 05 (Autograd): To calculate the gradients needed to improve.") - print(" - Module 06 (Optimizers): To update the model's weights automatically.") - print("\n For now, congratulations on this major milestone!") - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/modules/source/DEFINITIVE_MODULE_PLAN.md b/modules/source/DEFINITIVE_MODULE_PLAN.md index a8c8c9e7..755f1813 100644 --- a/modules/source/DEFINITIVE_MODULE_PLAN.md +++ b/modules/source/DEFINITIVE_MODULE_PLAN.md @@ -125,11 +125,23 @@ def log_softmax(x: Tensor, dim=-1) -> Tensor # Numerical stability --- -## πŸͺœ **Milestone 1: Perceptron (After Module 04)** -**Location:** `milestones/01_perceptron/` -**Deliverable:** Train Linear + Sigmoid on 2D dataset, visualize decision boundary -**Success Criteria:** 95% accuracy on linearly separable data -**Unlock:** Complete modules 01-04 + integration test +## πŸͺœ **Milestone 1: Perceptron 1957 (After Modules 04 & 07)** +**Location:** `milestones/01_perceptron_1957/` + +**Part 1: Forward Pass (After Module 04)** +- File: `forward_pass_interactive.py` +- Build perceptron with random weights +- Interactive CLI to manually tweak weights (frustration!) +- Success: ~40-60% accuracy (essentially random) +- Lesson: "I need automatic training!" + +**Part 2: Trained (After Module 07)** +- File: `perceptron_trained.py` +- Same architecture, NOW with backprop training +- Success: 95%+ accuracy on linearly separable data +- Lesson: "Training transforms random β†’ intelligent!" + +**Unlock:** Complete modules 01-04 for Part 1, modules 05-07 for Part 2 --- @@ -214,10 +226,29 @@ def clip_grad_norm(parameters, max_norm) --- -## πŸͺœ **Milestone 2: MLP (After Module 07)** -**Location:** `milestones/02_mlp/` -**Deliverable:** 2-layer MLP on MNIST, compare to perceptron -**Success Criteria:** >95% accuracy on MNIST +## πŸͺœ **Milestone 2: XOR Crisis 1969 (After Module 07)** +**Location:** `milestones/02_xor_crisis_1969/` +**File:** `perceptron_xor_fails.py` + +**Deliverable:** +- Try training perceptron on XOR problem (4 points!) +- Train for 1000+ epochs... stuck at ~50% +- Visualize why: XOR is NOT linearly separable +- Show decision boundary can't separate the points + +**What Students Learn:** +- Training works (we proved it in M1) +- But architecture has fundamental limitations +- Single layer = can only learn linear decision boundaries +- Historical context: Minsky's 1969 proof killed AI research for a decade + +**Success Criteria:** +- Perceptron trains but never exceeds 60% on XOR +- Visualization clearly shows the limitation +- Student understands WHY it fails (not linearly separable) + +**Emotional Beat:** "Wait... training doesn't solve everything?" + **Unlock:** Complete modules 05-07 + integration test --- @@ -276,11 +307,30 @@ class BatchNorm2d: --- -## πŸͺœ **Milestone 3: CNN (After Module 09)** -**Location:** `milestones/03_cnn/` -**Deliverable:** 3-layer CNN on CIFAR-10, visualize filters -**Success Criteria:** >75% accuracy on CIFAR-10 -**Unlock:** Complete modules 08-09 + integration test +## πŸͺœ **Milestone 3: MLP Revival 1986 (After Module 07)** +**Location:** `milestones/03_mlp_revival_1986/` +**Files:** `mlp_xor.py`, `mlp_mnist.py` + +**Deliverable:** +- Add ONE hidden layer to solve XOR β†’ 100% accuracy! +- Train MLP on MNIST β†’ 95%+ accuracy +- Compare to perceptron failure: depth changes everything +- Visualize curved decision boundary for XOR + +**What Students Learn:** +- Hidden layers enable non-linear decision boundaries +- Backpropagation + depth = AI renaissance +- Same training algorithm (backprop) works for any depth +- Historical context: Rumelhart's 1986 paper revived the field + +**Success Criteria:** +- MLP solves XOR: 100% accuracy +- MLP on MNIST: >95% accuracy +- Student understands power of depth + +**Emotional Beat:** "ONE hidden layer changes everything!" + +**Unlock:** Complete modules 05-07 + integration test --- @@ -400,11 +450,30 @@ def attention_with_cache(Q, K, V, cache, layer_idx, seq_pos) -> Tensor --- -## πŸͺœ **Milestone 4: TinyGPT (After Module 14)** -**Location:** `milestones/04_tinygpt/` -**Deliverable:** Character-level GPT on Shakespeare, generate text -**Success Criteria:** Perplexity < 2.0, coherent generation -**Unlock:** Complete modules 10-14 + integration test +## πŸͺœ **Milestone 4: CNN Revolution 1998 (After Module 09)** +**Location:** `milestones/04_cnn_revolution_1998/` +**File:** `lecun_cifar10.py` + +**Deliverable:** +- Build LeNet-style CNN for CIFAR-10 +- Convolutional layers exploit spatial structure +- Visualize learned filters (edge detectors, etc.) +- Compare to MLP: fewer parameters, better accuracy + +**What Students Learn:** +- Spatial inductive bias matters for vision +- Convolutions share weights across space +- Pooling provides translation invariance +- Historical context: LeCun's CNN revolutionized computer vision + +**Success Criteria:** +- CNN on CIFAR-10: >75% accuracy +- Visualizations show meaningful filters +- Student understands spatial structure + +**Emotional Beat:** "It SEES patterns in images!" + +**Unlock:** Complete modules 08-09 + integration test --- @@ -519,13 +588,56 @@ def plot_pareto_frontier(results: pd.DataFrame) --- -## πŸͺœ **Milestone 5: Systems Capstone (After Module 19)** -**Location:** `milestones/05_systems_capstone/` -**Deliverable:** Profile and optimize CNN vs TinyGPT -- Apply quantization and pruning -- Generate comparison report -- Show accuracy vs speed trade-offs -**Success Criteria:** 2Γ— speedup with <5% accuracy loss +## πŸͺœ **Milestone 5: Transformer Era 2017 (After Module 14)** +**Location:** `milestones/05_transformer_era_2017/` +**File:** `vaswani_shakespeare.py` + +**Deliverable:** +- Build character-level GPT on Shakespeare corpus +- Self-attention captures long-range dependencies +- Generate coherent text samples +- Compare to RNN: attention > recurrence + +**What Students Learn:** +- Attention mechanism enables parallelization +- Positional encoding for sequence order +- Autoregressive generation with KV caching +- Historical context: "Attention is all you need" changed NLP forever + +**Success Criteria:** +- Perplexity < 2.0 on Shakespeare +- Generated text is coherent (subjective) +- Student understands attention mechanism + +**Emotional Beat:** "It writes like Shakespeare!" + +**Unlock:** Complete modules 10-14 + integration test + +--- + +## πŸͺœ **Milestone 6: Systems Age 2024 (After Module 19)** +**Location:** `milestones/06_systems_age_2024/` +**File:** `optimize_models.py` + +**Deliverable:** +- Profile CNN (M4) and GPT (M5) for bottlenecks +- Apply quantization (INT8) and pruning (50% sparsity) +- Benchmark before/after optimization +- Generate performance comparison report + +**What Students Learn:** +- Profiling reveals true bottlenecks +- Quantization: 4Γ— memory reduction, minimal accuracy loss +- Pruning: Structured vs unstructured sparsity +- Modern ML is systems engineering + +**Success Criteria:** +- 2Γ— speedup with <5% accuracy loss +- Comprehensive benchmark report +- Student understands systems trade-offs + +**Emotional Beat:** "I made production AI!" + **Unlock:** Complete modules 15-19 + integration test --- @@ -569,11 +681,11 @@ def plot_pareto_frontier(results: pd.DataFrame) ## πŸš€ Implementation Order -1. **Phase 1:** Modules 01-04 β†’ Milestone 1 (Perceptron) -2. **Phase 2:** Modules 05-07 β†’ Milestone 2 (MLP) -3. **Phase 3:** Modules 08-09 β†’ Milestone 3 (CNN) -4. **Phase 4:** Modules 10-14 β†’ Milestone 4 (TinyGPT) -5. **Phase 5:** Modules 15-19 β†’ Milestone 5 (Systems) +1. **Phase 1:** Modules 01-04 β†’ Milestone 1 Part 1 (Perceptron forward pass) +2. **Phase 2:** Modules 05-07 β†’ Milestones 1 Part 2, 2, 3 (Training, Crisis, Revival) +3. **Phase 3:** Modules 08-09 β†’ Milestone 4 (CNN) +4. **Phase 4:** Modules 10-14 β†’ Milestone 5 (Transformers) +5. **Phase 5:** Modules 15-19 β†’ Milestone 6 (Systems) ---