Results: - Perceptron: ✅ (1.76s) 100.0% - XOR: ✅ (1.88s) 54.5% - MNIST: ✅ (1.89s) 9.0% - CIFAR: ❌ (3.85s) - TinyGPT: ✅ (1.84s)
🏆 TinyTorch Milestone Examples
Proof-of-mastery demonstrations showcasing what students can build after completing modules.
These examples demonstrate the evolutionary progression of neural networks from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.
🎯 Milestone Philosophy
Why These Specific Examples?
- Historical Progression: Experience the actual evolution of neural networks
- Capability Showcasing: Demonstrate specific breakthroughs at each stage
- Systems Thinking: Understand WHY each innovation mattered for ML systems
- Motivation: See real-world impact of concepts you're learning
- Integration: Prove mastery by combining multiple modules into working systems
What Makes This Educational?
- Not Just Algorithms: Focus on systems engineering and architectural insights
- Progressive Complexity: Each milestone builds capabilities from previous ones
- Real Implementations: Use actual TinyTorch modules students built
- Historical Context: Understand the engineering decisions that shaped modern ML
- Production Relevance: Connect to how these patterns appear in PyTorch/TensorFlow
📅 Historical Timeline & Module Mapping
🧠 Perceptron 1957 - perceptron_1957/
After Modules 2-4 • Foundation Building
Input → Linear → Sigmoid → Binary Output
Historical Significance: Frank Rosenblatt's perceptron launched the first AI wave What It Showcases:
- First trainable neural network
- Linear classification boundaries
- Gradient-based learning foundation
- Why single layers have limitations
Systems Insights:
- Memory: O(n) parameters, minimal storage
- Compute: O(n) operations per forward pass
- Limitations: Only linearly separable problems
Run After: Module 04 (Layers) ✅
⚡ XOR Problem 1969 - xor_1969/
After Modules 2-6 • Breaking Limitations
Input → Linear → ReLU → Linear → Output
Historical Significance: Minsky & Papert showed perceptron limitations; multi-layer networks solved them What It Showcases:
- Non-linear problem solving
- Hidden layer representations
- Why depth enables complexity
- Foundation for deep learning
Systems Insights:
- Memory: O(n²) parameters with hidden layers
- Compute: O(n²) operations, but enables non-linear solutions
- Architecture: Hidden representations crucial for complex patterns
Run After: Module 06 (Autograd) ✅
🔢 MNIST MLP 1986 - mnist_mlp_1986/
After Modules 2-8 • Real Vision Problems
Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes
Historical Significance: Backpropagation enabled training deep networks on real datasets What It Showcases:
- Multi-class classification
- Real vision datasets
- Multi-layer feature learning
- Complete training pipelines
Systems Insights:
- Memory: ~100K parameters for MNIST (manageable)
- Compute: Dense matrix operations, vectorization critical
- Scaling: 95%+ accuracy demonstrates effectiveness
Run After: Module 08 (Training) ✅
🖼️ CIFAR CNN Modern - cifar_cnn_modern/
After Modules 2-10 • Spatial Understanding
Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes
Historical Significance: CNNs revolutionized computer vision by exploiting spatial structure What It Showcases:
- Spatial feature extraction
- Hierarchical pattern recognition
- Translation invariance
- Natural image classification
Systems Insights:
- Memory: ~1M parameters, but shared weights reduce memory vs dense layers
- Compute: Convolution is compute-intensive but highly parallelizable
- Architecture: Local connectivity + weight sharing = spatial intelligence
Run After: Module 10 (DataLoader) + Module 09 (Spatial) ✅
🤖 TinyGPT 2018 - gpt_2018/
After Modules 2-14 • Language Understanding
Tokens → Embeddings → Attention → FFN → ... → Attention → Output
Historical Significance: Transformers + attention revolutionized NLP and launched the LLM era What It Showcases:
- Sequence modeling
- Attention mechanisms
- Autoregressive generation
- Foundation for ChatGPT/GPT-4
Systems Insights:
- Memory: O(n²) attention requires careful memory management
- Compute: Attention is compute-intensive but highly parallelizable
- Architecture: Self-attention enables long-range dependencies
Run After: Module 14 (Transformers) ✅
🎯 Learning Progression Design
Capability Building Sequence
| Stage | Capability Unlocked | Architectural Innovation | Real-World Impact |
|---|---|---|---|
| Stage 1 | Binary classification | Single-layer networks | Basic pattern recognition |
| Stage 2 | Non-linear problems | Hidden layers + activation | Complex decision boundaries |
| Stage 3 | Multi-class vision | Deep feedforward networks | Handwritten digit recognition |
| Stage 4 | Spatial understanding | Convolutional networks | Natural image classification |
| Stage 5 | Sequence modeling | Attention mechanisms | Language understanding |
Systems Engineering Progression
- Memory Management: From O(n) → O(n²) → O(n²) with optimizations
- Computational Complexity: Understanding trade-offs between accuracy and efficiency
- Architectural Patterns: How structure enables capability
- Production Deployment: What it takes to scale these in practice
🔧 Systems Analysis in Each Example
Each milestone includes:
Memory Profiling
import tracemalloc
tracemalloc.start()
# ... run model ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
Performance Measurement
# Parameter counting
total_params = sum(p.data.size for p in model.parameters())
print(f"Parameters: {total_params:,}")
# FLOP estimation
flops = estimate_flops(model, input_shape)
print(f"FLOPs per forward pass: {flops:,}")
Scaling Analysis
# Show how performance scales with model size
for hidden_size in [64, 128, 256, 512]:
model = create_model(hidden_size)
time_per_epoch = benchmark_training(model)
print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")
📂 File Structure
examples/
├── README.md # This file - milestone overview
├── perceptron_1957/
│ └── rosenblatt_perceptron.py # First trainable neural network
├── xor_1969/
│ └── minsky_xor_problem.py # Non-linear problem solving
├── mnist_mlp_1986/
│ └── train_mlp.py # Real vision with multi-layer networks
├── cifar_cnn_modern/
│ ├── train_cnn.py # Spatial feature extraction with CNNs
│ └── data/ # CIFAR-10 dataset
├── gpt_2018/
│ └── train_gpt.py # Language modeling with transformers
└── pretrained/
├── mnist_mlp_weights.npz # Pre-trained weights for quick demos
├── cifar10_cnn_weights.npz
└── xor_weights.npz
🚀 How to Run These Examples
Prerequisites Check
# 1. Verify your TinyTorch installation
tito system doctor
# 2. Check which modules you've completed
tito checkpoint status
# 3. Ensure you're in the project root
cd /path/to/TinyTorch
Dataset Management (Automatic)
Don't worry about data logistics! Each example automatically handles dataset downloading:
- MNIST: Downloads from official LeCun server (~60MB)
- CIFAR-10: Downloads from University of Toronto (~170MB)
- XOR/Perceptron: Generates synthetic data instantly
First run will download data, subsequent runs use cached data.
Running Examples by Module Completion
📱 Quick Test (No Training)
Test architecture and imports without waiting for downloads:
# Test what you've built so far
python examples/perceptron_1957/rosenblatt_perceptron.py --test-only
python examples/xor_1969/minsky_xor_problem.py --test-only
🎯 Full Milestone Demonstrations
# After Module 04 - Foundation (30 seconds)
python examples/perceptron_1957/rosenblatt_perceptron.py
# Demonstrates: YOU built Linear layers + activation functions
# After Module 06 - Autograd (1 minute)
python examples/xor_1969/minsky_xor_problem.py
# Demonstrates: YOU built gradient computation + training loops
# After Module 08 - Training (2-3 minutes + MNIST download)
python examples/mnist_mlp_1986/train_mlp.py
# Demonstrates: YOU built complete vision pipeline
# After Module 10 - DataLoader + Spatial (3-5 minutes + CIFAR download)
python examples/cifar_cnn_modern/train_cnn.py
# Demonstrates: YOU built convolutional networks
# After Module 14 - Transformers (5-10 minutes)
python examples/gpt_2018/train_gpt.py
# Demonstrates: YOU built attention mechanisms + language models
🚫 Troubleshooting Common Issues
Import Errors
# If you see "ModuleNotFoundError: No module named 'tinytorch'"
cd /path/to/TinyTorch
python -m pip install -e .
# Or run with explicit path
PYTHONPATH=/path/to/TinyTorch python examples/perceptron_1957/rosenblatt_perceptron.py
Dataset Download Issues
# Manual dataset download if automatic fails
python examples/data_manager.py # Test all datasets
# Or download specific datasets
python -c "from examples.data_manager import DatasetManager; DatasetManager().get_mnist()"
Memory Issues
# Reduce batch size for limited memory
python examples/cifar_cnn_modern/train_cnn.py --batch-size 16
# Use test mode for architecture validation only
python examples/mnist_mlp_1986/train_mlp.py --test-only
Slow Training
# Quick demo mode (reduced epochs)
python examples/mnist_mlp_1986/train_mlp.py --demo-mode
# Use pre-trained weights for instant results
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained
📊 Expected Performance & Timing
| Example | Dataset Size | Download Time | Training Time | Expected Accuracy |
|---|---|---|---|---|
| Perceptron 1957 | 1K synthetic | 0s | 30s | 95%+ (linearly separable) |
| XOR 1969 | 1K synthetic | 0s | 1min | 90%+ (non-linear) |
| MNIST MLP 1986 | 60K images | 2-5min | 2-3min | 85%+ (real vision) |
| CIFAR CNN Modern | 50K images | 5-10min | 3-5min | 65%+ (natural images) |
| TinyGPT 2018 | Text corpus | 1-2min | 5-10min | Coherent generation |
Note: First run includes dataset download time. Subsequent runs are much faster.
🤔 ML Systems Thinking Questions
After Each Milestone, Consider:
-
Memory Implications:
- How much memory does this architecture require?
- What happens when you scale to larger inputs/models?
-
Computational Complexity:
- Where are the computational bottlenecks?
- How does training time scale with model size?
-
Production Deployment:
- How would you serve this model to millions of users?
- What optimizations would you apply for real-time inference?
-
Historical Context:
- Why was this innovation important for the field?
- How does this relate to modern architectures (ResNet, BERT, GPT)?
-
Engineering Trade-offs:
- What are the memory vs accuracy trade-offs?
- When would you choose this architecture over alternatives?
🎓 Educational Outcomes
By completing all milestone examples, students will:
Technical Mastery
- ✅ Understand the evolution of neural network architectures
- ✅ Build complete ML systems from scratch using their own implementations
- ✅ Analyze memory and computational trade-offs in different architectures
- ✅ Connect historical innovations to modern production systems
Systems Engineering Mindset
- ✅ Think about scalability and production deployment from day one
- ✅ Understand the engineering decisions that shaped modern ML frameworks
- ✅ Develop intuition for when to use different architectural patterns
- ✅ Build confidence in ML systems engineering roles
Real-World Preparation
- ✅ Experience working with the same patterns used in PyTorch/TensorFlow
- ✅ Understand the systems thinking behind modern ML engineering
- ✅ Develop portfolio projects demonstrating deep technical understanding
- ✅ Build foundation for advanced ML systems engineering roles
Remember: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.
🚀 Start your journey: python examples/perceptron_1957/rosenblatt_perceptron.py