Educational improvements to milestone examples: NAMING FIXES (historically accurate): - Rename lenet_1998 → mnist_mlp_1986 (LeNet was CNN, not MLP) - Rename alexnet_2012 → cifar_cnn_modern (not actual AlexNet architecture) - Update all Dense → Linear for PyTorch consistency COMPREHENSIVE MILESTONE STRUCTURE: - Add detailed examples/README.md explaining historical progression - Map each milestone to specific module completion points: * Perceptron 1957: After Modules 2-4 (Foundation) * XOR 1969: After Modules 2-6 (Non-linear problems) * MNIST MLP 1986: After Modules 2-8 (Real vision) * CIFAR CNN Modern: After Modules 2-10 (Spatial understanding) * TinyGPT 2018: After Modules 2-14 (Language modeling) EDUCATIONAL VALUE: - Clear capability progression from basic to advanced - Systems analysis focus (memory, performance, scaling) - Production context connections to real PyTorch patterns - Historical significance explanations for each innovation All examples validated and working with current TinyTorch implementation. Students now have clear "proof of mastery" demonstrations at each stage.
🏆 TinyTorch Milestone Examples
Proof-of-mastery demonstrations showcasing what students can build after completing modules.
These examples demonstrate the evolutionary progression of neural networks from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.
🎯 Milestone Philosophy
Why These Specific Examples?
- Historical Progression: Experience the actual evolution of neural networks
- Capability Showcasing: Demonstrate specific breakthroughs at each stage
- Systems Thinking: Understand WHY each innovation mattered for ML systems
- Motivation: See real-world impact of concepts you're learning
- Integration: Prove mastery by combining multiple modules into working systems
What Makes This Educational?
- Not Just Algorithms: Focus on systems engineering and architectural insights
- Progressive Complexity: Each milestone builds capabilities from previous ones
- Real Implementations: Use actual TinyTorch modules students built
- Historical Context: Understand the engineering decisions that shaped modern ML
- Production Relevance: Connect to how these patterns appear in PyTorch/TensorFlow
📅 Historical Timeline & Module Mapping
🧠 Perceptron 1957 - perceptron_1957/
After Modules 2-4 • Foundation Building
Input → Linear → Sigmoid → Binary Output
Historical Significance: Frank Rosenblatt's perceptron launched the first AI wave What It Showcases:
- First trainable neural network
- Linear classification boundaries
- Gradient-based learning foundation
- Why single layers have limitations
Systems Insights:
- Memory: O(n) parameters, minimal storage
- Compute: O(n) operations per forward pass
- Limitations: Only linearly separable problems
Run After: Module 04 (Layers) ✅
⚡ XOR Problem 1969 - xor_1969/
After Modules 2-6 • Breaking Limitations
Input → Linear → ReLU → Linear → Output
Historical Significance: Minsky & Papert showed perceptron limitations; multi-layer networks solved them What It Showcases:
- Non-linear problem solving
- Hidden layer representations
- Why depth enables complexity
- Foundation for deep learning
Systems Insights:
- Memory: O(n²) parameters with hidden layers
- Compute: O(n²) operations, but enables non-linear solutions
- Architecture: Hidden representations crucial for complex patterns
Run After: Module 06 (Autograd) ✅
🔢 MNIST MLP 1986 - mnist_mlp_1986/
After Modules 2-8 • Real Vision Problems
Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes
Historical Significance: Backpropagation enabled training deep networks on real datasets What It Showcases:
- Multi-class classification
- Real vision datasets
- Multi-layer feature learning
- Complete training pipelines
Systems Insights:
- Memory: ~100K parameters for MNIST (manageable)
- Compute: Dense matrix operations, vectorization critical
- Scaling: 95%+ accuracy demonstrates effectiveness
Run After: Module 08 (Training) ✅
🖼️ CIFAR CNN Modern - cifar_cnn_modern/
After Modules 2-10 • Spatial Understanding
Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes
Historical Significance: CNNs revolutionized computer vision by exploiting spatial structure What It Showcases:
- Spatial feature extraction
- Hierarchical pattern recognition
- Translation invariance
- Natural image classification
Systems Insights:
- Memory: ~1M parameters, but shared weights reduce memory vs dense layers
- Compute: Convolution is compute-intensive but highly parallelizable
- Architecture: Local connectivity + weight sharing = spatial intelligence
Run After: Module 10 (DataLoader) + Module 09 (Spatial) ✅
🤖 TinyGPT 2018 - gpt_2018/
After Modules 2-14 • Language Understanding
Tokens → Embeddings → Attention → FFN → ... → Attention → Output
Historical Significance: Transformers + attention revolutionized NLP and launched the LLM era What It Showcases:
- Sequence modeling
- Attention mechanisms
- Autoregressive generation
- Foundation for ChatGPT/GPT-4
Systems Insights:
- Memory: O(n²) attention requires careful memory management
- Compute: Attention is compute-intensive but highly parallelizable
- Architecture: Self-attention enables long-range dependencies
Run After: Module 14 (Transformers) ✅
🎯 Learning Progression Design
Capability Building Sequence
| Stage | Capability Unlocked | Architectural Innovation | Real-World Impact |
|---|---|---|---|
| Stage 1 | Binary classification | Single-layer networks | Basic pattern recognition |
| Stage 2 | Non-linear problems | Hidden layers + activation | Complex decision boundaries |
| Stage 3 | Multi-class vision | Deep feedforward networks | Handwritten digit recognition |
| Stage 4 | Spatial understanding | Convolutional networks | Natural image classification |
| Stage 5 | Sequence modeling | Attention mechanisms | Language understanding |
Systems Engineering Progression
- Memory Management: From O(n) → O(n²) → O(n²) with optimizations
- Computational Complexity: Understanding trade-offs between accuracy and efficiency
- Architectural Patterns: How structure enables capability
- Production Deployment: What it takes to scale these in practice
🔧 Systems Analysis in Each Example
Each milestone includes:
Memory Profiling
import tracemalloc
tracemalloc.start()
# ... run model ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
Performance Measurement
# Parameter counting
total_params = sum(p.data.size for p in model.parameters())
print(f"Parameters: {total_params:,}")
# FLOP estimation
flops = estimate_flops(model, input_shape)
print(f"FLOPs per forward pass: {flops:,}")
Scaling Analysis
# Show how performance scales with model size
for hidden_size in [64, 128, 256, 512]:
model = create_model(hidden_size)
time_per_epoch = benchmark_training(model)
print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")
📂 File Structure
examples/
├── README.md # This file - milestone overview
├── perceptron_1957/
│ └── rosenblatt_perceptron.py # First trainable neural network
├── xor_1969/
│ └── minsky_xor_problem.py # Non-linear problem solving
├── mnist_mlp_1986/
│ └── train_mlp.py # Real vision with multi-layer networks
├── cifar_cnn_modern/
│ ├── train_cnn.py # Spatial feature extraction with CNNs
│ └── data/ # CIFAR-10 dataset
├── gpt_2018/
│ └── train_gpt.py # Language modeling with transformers
└── pretrained/
├── mnist_mlp_weights.npz # Pre-trained weights for quick demos
├── cifar10_cnn_weights.npz
└── xor_weights.npz
🚀 Running the Examples
Prerequisites Check
# Verify your TinyTorch installation
tito system doctor
# Check which modules you've completed
tito checkpoint status
Run Examples by Module Completion
# After Module 04 - Basic networks
python examples/perceptron_1957/rosenblatt_perceptron.py
# After Module 06 - Autograd
python examples/xor_1969/minsky_xor_problem.py
# After Module 08 - Training
python examples/mnist_mlp_1986/train_mlp.py
# After Module 10 - DataLoader + Spatial
python examples/cifar_cnn_modern/train_cnn.py
# After Module 14 - Transformers
python examples/gpt_2018/train_gpt.py
Quick Demo with Pre-trained Weights
# Use pre-trained weights for instant results
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained
python examples/cifar_cnn_modern/train_cnn.py --use-pretrained
🤔 ML Systems Thinking Questions
After Each Milestone, Consider:
-
Memory Implications:
- How much memory does this architecture require?
- What happens when you scale to larger inputs/models?
-
Computational Complexity:
- Where are the computational bottlenecks?
- How does training time scale with model size?
-
Production Deployment:
- How would you serve this model to millions of users?
- What optimizations would you apply for real-time inference?
-
Historical Context:
- Why was this innovation important for the field?
- How does this relate to modern architectures (ResNet, BERT, GPT)?
-
Engineering Trade-offs:
- What are the memory vs accuracy trade-offs?
- When would you choose this architecture over alternatives?
🎓 Educational Outcomes
By completing all milestone examples, students will:
Technical Mastery
- ✅ Understand the evolution of neural network architectures
- ✅ Build complete ML systems from scratch using their own implementations
- ✅ Analyze memory and computational trade-offs in different architectures
- ✅ Connect historical innovations to modern production systems
Systems Engineering Mindset
- ✅ Think about scalability and production deployment from day one
- ✅ Understand the engineering decisions that shaped modern ML frameworks
- ✅ Develop intuition for when to use different architectural patterns
- ✅ Build confidence in ML systems engineering roles
Real-World Preparation
- ✅ Experience working with the same patterns used in PyTorch/TensorFlow
- ✅ Understand the systems thinking behind modern ML engineering
- ✅ Develop portfolio projects demonstrating deep technical understanding
- ✅ Build foundation for advanced ML systems engineering roles
Remember: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.
🚀 Start your journey: python examples/perceptron_1957/rosenblatt_perceptron.py