Clean up milestone 02 to match milestone 01 structure

Milestone 02 Structure (matches milestone 01):
- README.md: Comprehensive guide with historical context
- xor_crisis.py: Part 1 - demonstrates single-layer failure (executable)
- xor_solved.py: Part 2 - demonstrates multi-layer success (executable)

Cleanup:
-  Removed old perceptron_xor_fails.py
-  Moved test files to tests/integration/
  - test_xor_simple.py
  - test_xor_thorough.py
  - test_xor_original_1986.py (verifies 2-2-1 architecture works!)
-  Updated README with clear instructions
-  Made scripts executable

Milestone 02 now has the same polish and structure as milestone 01:
- Clear file naming (crisis vs solved)
- Beautiful rich output
- Historical context
- Pedagogically structured
This commit is contained in:
Vijay Janapa Reddi
2025-09-30 14:14:37 -04:00
parent d231a91afc
commit 64416b14d2
7 changed files with 218 additions and 486 deletions

View File

@@ -1,84 +1,145 @@
# ⊕ XOR Problem (1969) - Minsky & Papert
## What This Demonstrates
The "impossible" problem that killed neural networks for a decade! Shows why hidden layers are essential for non-linear problems.
## Historical Significance
In 1969, Marvin Minsky and Seymour Papert published "Perceptrons," mathematically proving that single-layer perceptrons **cannot** solve the XOR problem. This revelation killed neural network research funding for over a decade - the infamous "AI Winter."
In 1986, Rumelhart, Hinton, and Williams published the backpropagation algorithm for multi-layer networks, and XOR became trivial. This milestone recreates both the crisis and the solution using YOUR TinyTorch!
## Prerequisites
Complete these TinyTorch modules first:
- Module 02 (Tensor) - Data structures
- Module 03 (Activations) - ReLU activation
- Module 04 (Layers) - Linear layers
- Module 06 (Autograd) - Backward propagation
## 🚀 Quick Start
Complete these TinyTorch modules first:
**For Part 1 (xor_crisis.py):**
- Module 01 (Tensor)
- Module 02 (Activations)
- Module 03 (Layers)
- Module 04 (Losses)
- Module 05 (Autograd)
- Module 06 (Optimizers)
**For Part 2 (xor_solved.py):**
- All of the above ✓
## Quick Start
### Part 1: The Crisis (1969)
Watch a single-layer perceptron **fail** to learn XOR:
```bash
# Solve XOR with hidden layers
python minsky_xor_problem.py
# Test architecture only
python minsky_xor_problem.py --test-only
# More training epochs for better accuracy
python minsky_xor_problem.py --epochs 2000
python milestones/02_xor_crisis_1969/xor_crisis.py
```
## 📊 Dataset Information
**Expected:** ~50% accuracy (random guessing) - proves Minsky was right!
### XOR Truth Table
```
x1 | x2 | XOR
---|----|-----
0 | 0 | 0 (same → 0)
0 | 1 | 1 (diff → 1)
1 | 0 | 1 (diff → 1)
1 | 1 | 0 (same → 0)
### Part 2: The Solution (1986)
Watch a multi-layer network **solve** the "impossible" problem:
```bash
python milestones/02_xor_crisis_1969/xor_solved.py
```
### Generated XOR Data
- **Size**: 1,000 samples with slight noise
- **Property**: NOT linearly separable
- **No Download Required**: Generated on-the-fly
**Expected:** 75%+ accuracy (problem solved!) - proves hidden layers work!
## The XOR Problem
### What is XOR?
XOR (Exclusive OR) outputs 1 when inputs **differ**, 0 when they're the **same**:
## 🏗️ Architecture
```
Input (2) → Linear (2→4) → ReLU → Linear (4→1) → Sigmoid → Output
↑ ↑
Hidden Layer! Output Layer
┌────┬────┬─────┐
│ x₁ │ x₂ │ XOR │
├────┼────┼─────┤
│ 0 │ 0 │ 0 │ ← same
│ 0 │ 1 │ 1 │ ← different
│ 1 │ 0 │ 1 │ ← different
│ 1 │ 1 │ 0 │ ← same
└────┴────┴─────┘
```
The hidden layer is the KEY - it learns features that make XOR separable!
### Why It's Impossible for Single Layers
## 📈 Expected Results
- **Training Time**: ~1 minute
- **Accuracy**: 90%+ (non-linear problem solved!)
- **Parameters**: 17 (compared to perceptron's 3)
The problem is **non-linearly separable** - no single straight line can separate the points:
## 💡 Historical Significance
- **1969**: Minsky proved single-layer perceptrons can't solve XOR
- **AI Winter**: Neural network research stopped for a decade
- **1986**: Backprop + hidden layers solved it (what YOU built!)
- **Insight**: Depth enables non-linear decision boundaries
## 🎨 Why XOR is Special
```
Single Layer Fails: Multi-Layer Succeeds:
1 │ ○ Hidden units learn:
- Unit 1: x1 AND NOT x2
╲ (No line works!) - Unit 2: x2 AND NOT x1
0 │ ● ╲ ○ Then combine: Unit1 OR Unit2
└───────────
0 1
Visual Representation:
1 │ ○ (0,1)(1,1) Try drawing a line:
[1] [0] ANY line fails!
0 │ ● (0,0) ○ (1,0)
│ [0] [1]
└─────────────────
0 1
```
## 🔧 Command Line Options
- `--test-only`: Test architecture without training
- `--epochs N`: Training epochs (default: 1000)
- `--visualize`: Show XOR visualization (default: True)
This fundamental limitation ended the first era of neural networks.
## 📚 What You Learn
- Why neural networks need hidden layers
- How non-linearity (ReLU) enables complex functions
- YOUR autograd handles multi-layer backprop
- Foundation principle for all deep learning
## The Solution
Hidden layers create a **new feature space** where XOR becomes linearly separable!
### Original 1986 Architecture
```
Input (2) → Hidden (2) + Sigmoid → Output (1) + Sigmoid
Total: Only 9 parameters!
```
The 2 hidden units learn:
- `h₁ ≈ x₁ AND NOT x₂`
- `h₂ ≈ x₂ AND NOT x₁`
- `output ≈ h₁ OR h₂` = XOR
### Our Implementation
```
Input (2) → Hidden (4-8) + ReLU → Output (1) + Sigmoid
Modern activation, slightly larger for robustness
```
## Expected Results
### Part 1: The Crisis
- **Accuracy:** ~50% (random guessing)
- **Loss:** Stuck around 0.69 (not decreasing)
- **Weights:** Don't converge to meaningful values
- **Conclusion:** Single-layer perceptrons **cannot** solve XOR
### Part 2: The Solution
- **Accuracy:** 75-100% (problem solved!)
- **Loss:** Decreases to ~0.35 or lower
- **Weights:** Learn meaningful features
- **Conclusion:** Multi-layer networks **can** solve XOR
## What You Learn
1. **Why depth matters** - Hidden layers enable non-linear functions
2. **Historical context** - The XOR crisis that stopped AI research
3. **The breakthrough** - Backpropagation through hidden layers
4. **Your autograd works!** - Multi-layer gradients flow correctly
## Files in This Milestone
- `xor_crisis.py` - Single-layer perceptron **failing** on XOR (1969 crisis)
- `xor_solved.py` - Multi-layer network **solving** XOR (1986 breakthrough)
- `README.md` - This file
## Historical Timeline
- **1969:** Minsky & Papert prove single-layer networks can't solve XOR
- **1970-1986:** AI Winter - 17 years of minimal neural network research
- **1986:** Rumelhart, Hinton, Williams publish backpropagation for multi-layer nets
- **1986+:** AI Renaissance begins
- **TODAY:** Deep learning powers GPT, AlphaGo, autonomous vehicles, etc.
## Next Steps
After completing this milestone:
- **Milestone 03:** MLP Revival (1986) - Train deeper networks on real data
- **Module 08:** DataLoaders for batch processing
- **Module 09:** CNNs for image recognition
Every modern AI architecture builds on what you just learned - hidden layers + backpropagation!

View File

@@ -1,424 +0,0 @@
#!/usr/bin/env python3
"""
The XOR Problem (1969) - Minsky & Papert
========================================
📚 HISTORICAL CONTEXT:
In 1969, Marvin Minsky and Seymour Papert published "Perceptrons," proving that
single-layer perceptrons CANNOT solve the XOR problem. This killed neural network
research for a decade (the "AI Winter") until multi-layer networks solved it!
🎯 WHAT YOU'RE BUILDING:
Using YOUR TinyTorch implementations, you'll solve the "impossible" XOR problem
that stumped AI for years - proving that YOUR hidden layers enable non-linear learning!
✅ REQUIRED MODULES (Run after Module 6):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Module 02 (Tensor) : YOUR data structure with autodiff
Module 03 (Activations) : YOUR ReLU for non-linearity (the key!)
Module 04 (Layers) : YOUR Linear layers for transformations
Module 06 (Autograd) : YOUR gradient computation for learning
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏗️ ARCHITECTURE (Multi-Layer Solution):
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Input │ │ Linear │ │ ReLU │ │ Linear │ │ Binary │
│ (x1,x2) │───▶│ 2→4 │───▶│ Hidden │───▶│ 4→1 │───▶│ Output │
│ 2 dims │ │ YOUR M4 │ │ YOUR M3 │ │ YOUR M4 │ │ 0 or 1 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
Hidden Layer Non-linearity Output Layer
🔍 WHY XOR IS SPECIAL - THE NON-LINEAR SEPARABILITY PROBLEM:
The XOR (exclusive OR) problem outputs 1 when inputs differ, 0 when they match:
Input Space: XOR Truth Table:
1 │ (0,1)→1 (1,1)→0 │ x1 │ x2 │ XOR │
│ RED BLUE ├────┼────┼─────┤
│ │ 0 │ 0 │ 0 │ (same → 0)
0 │ (0,0)→0 (1,0)→1 │ 0 │ 1 │ 1 │ (diff → 1)
│ BLUE RED │ 1 │ 0 │ 1 │ (diff → 1)
└──────────────────── │ 1 │ 1 │ 0 │ (same → 0)
0 1 └────┴────┴─────┘
🚫 IMPOSSIBLE with single line: ✅ POSSIBLE with hidden layer:
No single line can separate Hidden units learn features:
RED from BLUE points! - Unit 1: (x1 AND NOT x2)
- Unit 2: (x2 AND NOT x1)
1 │ R B Then combine: Unit1 OR Unit2
0 │ B R The hidden layer creates a new
└──────────── feature space where XOR becomes
0 1 linearly separable!
This is why neural networks need DEPTH - hidden layers create new representations!
📊 EXPECTED PERFORMANCE:
- Dataset: 1,000 XOR samples with slight noise
- Training time: 1 minute
- Expected accuracy: 95%+ (non-linear problem solved!)
- Key insight: Hidden layer enables non-linear decision boundary
"""
import sys
import os
import numpy as np
# Add project root to path
if __name__ == "__main__":
# When run as script
project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, project_root)
else:
# When imported, assume we're already in right location
sys.path.insert(0, os.getcwd())
# Import TinyTorch components YOU BUILT!
from tinytorch import Tensor, Linear, ReLU, Sigmoid, BinaryCrossEntropyLoss, SGD
class XORNetwork:
"""
Multi-layer network that solves XOR using YOUR TinyTorch implementations!
The hidden layer is the KEY - it learns features that make XOR separable.
"""
def __init__(self, input_size=2, hidden_size=4, output_size=1):
print("🧠 Building XOR Network with YOUR TinyTorch modules...")
# Hidden layer - this is what Minsky said was needed!
self.hidden = Linear(input_size, hidden_size) # Module 04: YOUR Linear layer!
self.activation = ReLU() # Module 03: YOUR ReLU (key to non-linearity!)
self.output = Linear(hidden_size, output_size) # Module 04: YOUR output layer!
self.sigmoid = Sigmoid() # Module 03: YOUR final activation!
print(f" Input → Hidden: {input_size}{hidden_size} (YOUR Linear layer)")
print(f" Hidden activation: ReLU (YOUR non-linearity - this solves XOR!)")
print(f" Hidden → Output: {hidden_size}{output_size} (YOUR Linear layer)")
print(f" Output activation: Sigmoid (YOUR Module 03)")
def forward(self, x):
"""Forward pass through YOUR multi-layer network."""
# Hidden layer with non-linearity (the SECRET to solving XOR!)
x = self.hidden(x) # Module 04: YOUR Linear transformation!
x = self.activation(x) # Module 03: YOUR ReLU - creates non-linear features!
# Output layer
x = self.output(x) # Module 04: YOUR final transformation!
x = self.sigmoid(x) # Module 03: YOUR sigmoid for probability!
return x
def parameters(self):
"""Get all trainable parameters from YOUR layers."""
return [
self.hidden.weights, self.hidden.bias, # Module 04: YOUR hidden parameters!
self.output.weights, self.output.bias # Module 04: YOUR output parameters!
]
def visualize_xor_problem():
"""Show why XOR is non-linearly separable using ASCII art."""
print("\n" + "="*70)
print("🎨 VISUALIZING THE XOR PROBLEM - Why Single Layers Fail:")
print("="*70)
print("""
XOR DATA POINTS: SINGLE LAYER ATTEMPT:
1.0 │ ○(0,1)=1 ●(1,1)=0 1.0 │ ○ ●
│ RED BLUE │ ╲
│ │ ╲ ← No single line
0.5 │ 0.5 │ ╲ can separate!
│ │ ╲
│ │ ╲
0.0 │ ●(0,0)=0 ○(1,0)=1 0.0 │ ● ╲ ○
└───────────────────── └─────────────────
0.0 0.5 1.0 0.0 0.5 1.0
Legend: ○ = Output 1 (RED) Problem: RED and BLUE points
● = Output 0 (BLUE) are diagonally mixed!
""")
print("🔄 THE MULTI-LAYER SOLUTION:")
print("""
Hidden Layer Features: New Feature Space:
Hidden Unit 1: x1 AND NOT x2 In hidden space, XOR becomes
Hidden Unit 2: x2 AND NOT x1 linearly separable!
Original → Hidden Transform: Now a single line works:
(0,0) → [0,0] → 0 ✓
(0,1) → [0,1] → 1 ✓ H2 │ ○(0,1)
(1,0) → [1,0] → 1 ✓ │
(1,1) → [0,0] → 0 ✓ │ ○(1,0)
YOUR hidden layer learned 0 │ ●────────────
to transform the problem! 0 H1
""")
print("="*70)
def train_xor_network(model, X, y, learning_rate=0.1, epochs=100):
"""
Train XOR network using YOUR autograd system with efficient monitoring!
This uses a simplified but effective approach with progress tracking.
"""
print("\n🚀 Training XOR Network with YOUR TinyTorch autograd!")
print(f" Learning rate: {learning_rate}")
print(f" Max epochs: {epochs}")
print(f" Using validation split and progress monitoring!")
# Split data manually for monitoring
n_samples = len(X)
n_val = int(n_samples * 0.2)
indices = np.random.permutation(n_samples)
val_indices = indices[:n_val]
train_indices = indices[n_val:]
X_train, X_val = X[train_indices], X[val_indices]
y_train, y_val = y[train_indices], y[val_indices]
print(f" Split: {len(X_train)} training, {len(X_val)} validation samples")
# Convert to YOUR Tensor format
X_train_tensor = Tensor(X_train)
y_train_tensor = Tensor(y_train.reshape(-1, 1))
X_val_tensor = Tensor(X_val)
y_val_tensor = Tensor(y_val.reshape(-1, 1))
# Track metrics
train_losses, val_losses = [], []
train_accs, val_accs = [], []
best_val_loss = float('inf')
patience = 20
epochs_no_improve = 0
for epoch in range(epochs):
# Training step
predictions = model.forward(X_train_tensor)
# Simple MSE loss that maintains computational graph
diff = predictions - y_train_tensor
squared_diff = diff * diff
# Backward pass with proper graph maintenance
n_samples = squared_diff.data.shape[0]
grad_output = Tensor(np.ones_like(squared_diff.data) / n_samples)
squared_diff.backward(grad_output)
# Update parameters
for param in model.parameters():
if param.grad is not None:
grad_data = param.grad.data if hasattr(param.grad, 'data') else param.grad
grad_np = np.array(grad_data.data if hasattr(grad_data, 'data') else grad_data)
param.data = param.data - learning_rate * grad_np
param.grad = None
# Calculate metrics
pred_np = np.array(predictions.data.data if hasattr(predictions.data, 'data') else predictions.data)
y_train_np = np.array(y_train_tensor.data.data if hasattr(y_train_tensor.data, 'data') else y_train_tensor.data)
train_loss = np.mean((pred_np - y_train_np) ** 2)
train_acc = np.mean((pred_np > 0.5) == y_train_np) * 100
# Validation step
val_predictions = model.forward(X_val_tensor)
val_pred_np = np.array(val_predictions.data.data if hasattr(val_predictions.data, 'data') else val_predictions.data)
y_val_np = np.array(y_val_tensor.data.data if hasattr(y_val_tensor.data, 'data') else y_val_tensor.data)
val_loss = np.mean((val_pred_np - y_val_np) ** 2)
val_acc = np.mean((val_pred_np > 0.5) == y_val_np) * 100
# Track metrics
train_losses.append(train_loss)
val_losses.append(val_loss)
train_accs.append(train_acc)
val_accs.append(val_acc)
# Early stopping check
if val_loss < best_val_loss - 1e-4:
best_val_loss = val_loss
epochs_no_improve = 0
status = "📈"
else:
epochs_no_improve += 1
status = "⚠️" if epochs_no_improve > patience // 2 else "📊"
# Progress updates
if epoch % 5 == 0 or epoch == epochs - 1:
print(f" {status} Epoch {epoch+1:3d}: Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, "
f"Train Acc: {train_acc:.1f}%, Val Acc: {val_acc:.1f}%")
if val_loss == best_val_loss:
print(f" ✅ New best validation loss: {val_loss:.4f}")
# Early stopping
if epochs_no_improve >= patience:
print(f" Early stopping triggered after {patience} epochs without improvement")
break
# Create monitor-like object for compatibility
class SimpleMonitor:
def __init__(self):
self.train_losses = train_losses
self.val_losses = val_losses
self.train_accuracies = train_accs
self.val_accuracies = val_accs
self.best_val_loss = best_val_loss
self.should_stop = epochs_no_improve >= patience
def get_summary(self):
return {
'total_epochs': len(train_losses),
'best_val_loss': self.best_val_loss,
'final_train_acc': train_accs[-1] if train_accs else 0,
'best_val_acc': max(val_accs) if val_accs else 0,
'early_stopped': self.should_stop,
'epochs_no_improve': epochs_no_improve,
'total_time': 0.1 # Placeholder
}
monitor = SimpleMonitor()
print(f"\n🏁 Training Complete!")
print(f" • Total epochs: {len(train_losses)}")
print(f" • Best validation loss: {best_val_loss:.4f}")
print(f" • Best validation accuracy: {max(val_accs):.1f}%")
print(f" • Final training accuracy: {train_accs[-1]:.1f}%")
return model, monitor
def test_xor_solution(model, show_examples=True):
"""Test YOUR XOR solution on the classic 4 points."""
print("\n🧪 Testing YOUR XOR Network on Classic Examples:")
print(" " + ""*45)
# The classic XOR test cases
test_cases = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
expected = np.array([0, 1, 1, 0])
# Test with YOUR network
X_test = Tensor(test_cases) # Module 02: YOUR Tensor!
predictions = model.forward(X_test) # YOUR forward pass!
pred_np = np.array(predictions.data.data if hasattr(predictions.data, 'data') else predictions.data)
predicted_classes = (pred_np > 0.5).astype(int).flatten()
# Display results
print(" │ x1 │ x2 │ Expected │ YOUR Output │ ✓/✗ │")
print(" ├────┼────┼──────────┼─────────────┼─────┤")
all_correct = True
for i in range(4):
x1, x2 = test_cases[i]
exp = expected[i]
pred = predicted_classes[i]
prob = pred_np[i, 0]
status = "" if pred == exp else ""
if pred != exp:
all_correct = False
print(f"{x1:.0f}{x2:.0f}{exp}{pred} ({prob:.3f}) │ {status}")
print(" " + ""*45)
if all_correct:
print(" 🎉 SUCCESS! YOUR network solved XOR perfectly!")
print(" Hidden layers enabled non-linear learning!")
else:
print(" 🔄 Network still training... (try more epochs)")
return all_correct
def analyze_xor_systems(model, monitor=None):
"""Analyze YOUR XOR solution from an ML systems perspective."""
print("\n🔬 SYSTEMS ANALYSIS of YOUR XOR Network:")
# Parameter count
total_params = sum(p.data.size for p in model.parameters())
print(f" Parameters: {total_params} weights (YOUR Linear layers)")
print(f" Architecture: 2 → 4 → 1 (minimal for XOR)")
print(f" Key innovation: Hidden layer creates non-linear features")
print(f" Memory: {total_params * 4} bytes (float32)")
# Training efficiency analysis
if monitor:
summary = monitor.get_summary()
print(f"\n 🚀 Training Efficiency:")
print(f" • Epochs to convergence: {summary['total_epochs']}")
print(f" • Training time: {summary['total_time']:.1f}s")
print(f" • Validation-based early stopping: {'Yes' if summary['early_stopped'] else 'No'}")
print(f" • Best validation loss: {summary['best_val_loss']:.4f}")
print("\n 🏛️ Historical Impact:")
print(" • 1969: Minsky showed single layers CAN'T solve XOR")
print(" • 1970s: 'AI Winter' - neural networks abandoned")
print(" • 1980s: Backprop + hidden layers solved it (YOUR approach!)")
print(" • Today: Deep networks with many hidden layers power AI")
print("\n 💡 Why This Matters:")
print(" • YOUR hidden layer transforms the feature space")
print(" • Non-linear activation (ReLU) is ESSENTIAL")
print(" • This principle scales to ImageNet, GPT, etc.")
print(" • Modern AI = deeper versions of YOUR XOR network!")
def main():
"""Demonstrate the XOR solution using YOUR TinyTorch system!"""
parser = argparse.ArgumentParser(description='XOR Problem 1969')
parser.add_argument('--test-only', action='store_true',
help='Test architecture without training')
parser.add_argument('--epochs', type=int, default=100,
help='Number of training epochs (with early stopping)')
parser.add_argument('--visualize', action='store_true', default=True,
help='Show XOR visualization')
args = parser.parse_args()
print("🎯 XOR PROBLEM 1969 - Breaking the Linear Barrier!")
print(" Historical significance: Proved need for hidden layers")
print(" YOUR achievement: Solving 'impossible' problem with YOUR network")
print(" Components used: YOUR Tensor + Linear + ReLU + Autograd")
# Show why XOR is special
if args.visualize:
visualize_xor_problem()
# Step 1: Get XOR data
print("\n📊 Generating XOR dataset...")
data_manager = DatasetManager()
X, y = data_manager.get_xor_data(num_samples=1000)
print(f" Generated {len(X)} XOR samples with noise")
# Step 2: Create network with YOUR components
model = XORNetwork(input_size=2, hidden_size=4, output_size=1)
if args.test_only:
print("\n🧪 ARCHITECTURE TEST MODE")
test_input = Tensor(X[:4]) # Module 02: YOUR Tensor!
test_output = model.forward(test_input) # YOUR architecture!
print(f"✅ Forward pass successful! Output shape: {test_output.data.shape}")
print("✅ YOUR multi-layer network works!")
return
# Step 3: Train using YOUR autograd with modern infrastructure
model, monitor = train_xor_network(model, X, y, epochs=args.epochs)
# Step 4: Test on classic XOR cases
solved = test_xor_solution(model)
# Step 5: Systems analysis
analyze_xor_systems(model, monitor)
print("\n✅ SUCCESS! XOR Milestone Complete!")
print("\n🎓 What YOU Accomplished:")
print(" • YOU solved the 'impossible' XOR problem")
print(" • YOUR hidden layer creates non-linear decision boundaries")
print(" • YOUR ReLU activation enables feature learning")
print(" • YOUR autograd trains multi-layer networks")
print("\n🚀 Next Steps:")
print(" • Continue to MNIST MLP after Module 08 (Training)")
print(" • YOUR XOR solution scales to real vision problems!")
print(" • Hidden layers principle powers all modern deep learning!")
if __name__ == "__main__":
main()

0
milestones/02_xor_crisis_1969/xor_crisis.py Normal file → Executable file
View File

0
milestones/02_xor_crisis_1969/xor_solved.py Normal file → Executable file
View File

View File

@@ -0,0 +1,95 @@
#!/usr/bin/env python3
"""
Original 1986 XOR Solution - Rumelhart, Hinton, Williams
Testing the MINIMAL architecture that solved the XOR crisis.
"""
import sys
sys.path.insert(0, '.')
import numpy as np
from tinytorch import Tensor, Linear, Sigmoid, BinaryCrossEntropyLoss, SGD
print("=" * 70)
print("🏛️ ORIGINAL 1986 XOR SOLUTION")
print("Rumelhart, Hinton, Williams - 'Learning representations by back-propagating errors'")
print("=" * 70)
# Pure XOR
X_data = np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]], dtype=np.float32)
y_data = np.array([[0.0], [1.0], [1.0], [0.0]], dtype=np.float32)
X = Tensor(X_data)
y = Tensor(y_data)
print("\n🏗️ Architecture (1986 style):")
print(" Input: 2 neurons")
print(" Hidden: 2 neurons (MINIMAL!)")
print(" Output: 1 neuron")
print(" Activation: Sigmoid (ReLU didn't exist yet!)")
print(" Total params: 9 (2×2 weights + 2 bias + 2×1 weights + 1 bias)")
# Original architecture: 2-2-1 with Sigmoid
hidden = Linear(2, 2) # Only 2 hidden neurons!
sigmoid_hidden = Sigmoid()
output = Linear(2, 1)
sigmoid_output = Sigmoid()
loss_fn = BinaryCrossEntropyLoss()
optimizer = SGD([p for p in hidden.parameters()] + [p for p in output.parameters()], lr=1.0)
print("\n🔥 Training with original 1986 architecture...")
epochs = 2000 # May need more epochs with only 2 hidden units
for epoch in range(epochs):
# Forward (all sigmoid, like 1986!)
h = hidden(X)
h_act = sigmoid_hidden(h) # Sigmoid in hidden layer
out = output(h_act)
pred = sigmoid_output(out) # Sigmoid in output layer
loss = loss_fn(pred, y)
# Backward
loss.backward()
# Update
optimizer.step()
optimizer.zero_grad()
if (epoch + 1) % 400 == 0:
accuracy = ((pred.data > 0.5).astype(float) == y.data).mean()
print(f"Epoch {epoch+1:4d}/{epochs} Loss: {loss.data:.4f} Accuracy: {accuracy:.1%}")
# Final evaluation
print("\n✅ Final Results:")
final_accuracy = ((pred.data > 0.5).astype(float) == y.data).mean()
for i in range(4):
x_in = X_data[i]
y_true = int(y_data[i, 0])
y_pred_prob = pred.data[i, 0]
y_pred = int(y_pred_prob > 0.5)
status = "" if y_pred == y_true else ""
print(f" Input: {x_in} → Pred: {y_pred} (prob: {y_pred_prob:.3f}) True: {y_true} {status}")
print(f"\n📊 Final Accuracy: {final_accuracy:.1%}")
print(f"📊 Final Loss: {loss.data:.4f}")
if final_accuracy == 1.0:
print("\n🎉 SUCCESS! XOR solved with MINIMAL 1986 architecture!")
print(" This is exactly what ended the AI Winter!")
else:
print(f"\n⚠️ Accuracy: {final_accuracy:.1%} - may need more training")
# Show what the hidden units learned
print("\n🧠 What the 2 hidden neurons learned:")
print(" (Examining activation patterns)")
h_activations = sigmoid_hidden(hidden(X)).data
print(f"\n Hidden unit activations for each input:")
for i, x_in in enumerate(X_data):
print(f" {x_in}: h1={h_activations[i,0]:.3f}, h2={h_activations[i,1]:.3f}")
print("\n" + "=" * 70)
print("💡 Historical Note:")
print(" This 2-2-1 architecture ended the 17-year AI Winter!")
print(" Proved that backprop + hidden layers solve 'impossible' problems")
print("=" * 70)