Files
TinyTorch/examples/README.md
Vijay Janapa Reddi ecdc879dda LOGISTICS: Add comprehensive milestone example infrastructure
Address practical concerns about running milestone examples:

DATASET MANAGEMENT:
- Add data_manager.py for automatic dataset downloading
- Support MNIST, CIFAR-10, XOR, and Perceptron datasets
- Handle download with progress bars and caching
- Clear error handling and fallback options

STANDARDIZED TEMPLATE:
- Create MILESTONE_TEMPLATE.py showing standard structure
- Emphasize "YOU BUILT THIS" throughout code comments
- Include historical context and educational rationale
- Add systems analysis (memory, performance, scaling)
- Clear module prerequisite mapping

RUNNING INSTRUCTIONS:
- Comprehensive troubleshooting section in README
- Performance expectations and timing estimates
- Command-line options (--test-only, --demo-mode)
- Clear dataset logistics explanation

EXAMPLE IMPLEMENTATION:
- Update perceptron_1957 to follow new template
- Demonstrate "YOUR TinyTorch" emphasis throughout
- Show proper dataset integration and systems analysis
- Include command-line interface for different modes

Students now have clear, practical milestone examples that:
- Handle all dataset logistics automatically
- Emphasize their own implementations throughout
- Provide historical context and educational value
- Include troubleshooting and performance guidance
2025-09-26 13:00:48 -04:00

388 lines
13 KiB
Markdown

# 🏆 TinyTorch Milestone Examples
**Proof-of-mastery demonstrations showcasing what students can build after completing modules.**
These examples demonstrate the **evolutionary progression of neural networks** from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.
---
## 🎯 **Milestone Philosophy**
### **Why These Specific Examples?**
1. **Historical Progression**: Experience the actual evolution of neural networks
2. **Capability Showcasing**: Demonstrate specific breakthroughs at each stage
3. **Systems Thinking**: Understand WHY each innovation mattered for ML systems
4. **Motivation**: See real-world impact of concepts you're learning
5. **Integration**: Prove mastery by combining multiple modules into working systems
### **What Makes This Educational?**
- **Not Just Algorithms**: Focus on systems engineering and architectural insights
- **Progressive Complexity**: Each milestone builds capabilities from previous ones
- **Real Implementations**: Use actual TinyTorch modules students built
- **Historical Context**: Understand the engineering decisions that shaped modern ML
- **Production Relevance**: Connect to how these patterns appear in PyTorch/TensorFlow
---
## 📅 **Historical Timeline & Module Mapping**
### **🧠 Perceptron 1957** - `perceptron_1957/`
**After Modules 2-4** • *Foundation Building*
```
Input → Linear → Sigmoid → Binary Output
```
**Historical Significance**: Frank Rosenblatt's perceptron launched the first AI wave
**What It Showcases**:
- First trainable neural network
- Linear classification boundaries
- Gradient-based learning foundation
- Why single layers have limitations
**Systems Insights**:
- Memory: O(n) parameters, minimal storage
- Compute: O(n) operations per forward pass
- Limitations: Only linearly separable problems
**Run After**: Module 04 (Layers) ✅
---
### **⚡ XOR Problem 1969** - `xor_1969/`
**After Modules 2-6** • *Breaking Limitations*
```
Input → Linear → ReLU → Linear → Output
```
**Historical Significance**: Minsky & Papert showed perceptron limitations; multi-layer networks solved them
**What It Showcases**:
- Non-linear problem solving
- Hidden layer representations
- Why depth enables complexity
- Foundation for deep learning
**Systems Insights**:
- Memory: O(n²) parameters with hidden layers
- Compute: O(n²) operations, but enables non-linear solutions
- Architecture: Hidden representations crucial for complex patterns
**Run After**: Module 06 (Autograd) ✅
---
### **🔢 MNIST MLP 1986** - `mnist_mlp_1986/`
**After Modules 2-8** • *Real Vision Problems*
```
Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes
```
**Historical Significance**: Backpropagation enabled training deep networks on real datasets
**What It Showcases**:
- Multi-class classification
- Real vision datasets
- Multi-layer feature learning
- Complete training pipelines
**Systems Insights**:
- Memory: ~100K parameters for MNIST (manageable)
- Compute: Dense matrix operations, vectorization critical
- Scaling: 95%+ accuracy demonstrates effectiveness
**Run After**: Module 08 (Training) ✅
---
### **🖼️ CIFAR CNN Modern** - `cifar_cnn_modern/`
**After Modules 2-10** • *Spatial Understanding*
```
Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes
```
**Historical Significance**: CNNs revolutionized computer vision by exploiting spatial structure
**What It Showcases**:
- Spatial feature extraction
- Hierarchical pattern recognition
- Translation invariance
- Natural image classification
**Systems Insights**:
- Memory: ~1M parameters, but shared weights reduce memory vs dense layers
- Compute: Convolution is compute-intensive but highly parallelizable
- Architecture: Local connectivity + weight sharing = spatial intelligence
**Run After**: Module 10 (DataLoader) + Module 09 (Spatial) ✅
---
### **🤖 TinyGPT 2018** - `gpt_2018/`
**After Modules 2-14** • *Language Understanding*
```
Tokens → Embeddings → Attention → FFN → ... → Attention → Output
```
**Historical Significance**: Transformers + attention revolutionized NLP and launched the LLM era
**What It Showcases**:
- Sequence modeling
- Attention mechanisms
- Autoregressive generation
- Foundation for ChatGPT/GPT-4
**Systems Insights**:
- Memory: O(n²) attention requires careful memory management
- Compute: Attention is compute-intensive but highly parallelizable
- Architecture: Self-attention enables long-range dependencies
**Run After**: Module 14 (Transformers) ✅
---
## 🎯 **Learning Progression Design**
### **Capability Building Sequence**
| Stage | Capability Unlocked | Architectural Innovation | Real-World Impact |
|-------|-------------------|------------------------|------------------|
| **Stage 1** | Binary classification | Single-layer networks | Basic pattern recognition |
| **Stage 2** | Non-linear problems | Hidden layers + activation | Complex decision boundaries |
| **Stage 3** | Multi-class vision | Deep feedforward networks | Handwritten digit recognition |
| **Stage 4** | Spatial understanding | Convolutional networks | Natural image classification |
| **Stage 5** | Sequence modeling | Attention mechanisms | Language understanding |
### **Systems Engineering Progression**
- **Memory Management**: From O(n) → O(n²) → O(n²) with optimizations
- **Computational Complexity**: Understanding trade-offs between accuracy and efficiency
- **Architectural Patterns**: How structure enables capability
- **Production Deployment**: What it takes to scale these in practice
---
## 🔧 **Systems Analysis in Each Example**
Each milestone includes:
### **Memory Profiling**
```python
import tracemalloc
tracemalloc.start()
# ... run model ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
```
### **Performance Measurement**
```python
# Parameter counting
total_params = sum(p.data.size for p in model.parameters())
print(f"Parameters: {total_params:,}")
# FLOP estimation
flops = estimate_flops(model, input_shape)
print(f"FLOPs per forward pass: {flops:,}")
```
### **Scaling Analysis**
```python
# Show how performance scales with model size
for hidden_size in [64, 128, 256, 512]:
model = create_model(hidden_size)
time_per_epoch = benchmark_training(model)
print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")
```
---
## 📂 **File Structure**
```
examples/
├── README.md # This file - milestone overview
├── perceptron_1957/
│ └── rosenblatt_perceptron.py # First trainable neural network
├── xor_1969/
│ └── minsky_xor_problem.py # Non-linear problem solving
├── mnist_mlp_1986/
│ └── train_mlp.py # Real vision with multi-layer networks
├── cifar_cnn_modern/
│ ├── train_cnn.py # Spatial feature extraction with CNNs
│ └── data/ # CIFAR-10 dataset
├── gpt_2018/
│ └── train_gpt.py # Language modeling with transformers
└── pretrained/
├── mnist_mlp_weights.npz # Pre-trained weights for quick demos
├── cifar10_cnn_weights.npz
└── xor_weights.npz
```
---
## 🚀 **How to Run These Examples**
### **Prerequisites Check**
```bash
# 1. Verify your TinyTorch installation
tito system doctor
# 2. Check which modules you've completed
tito checkpoint status
# 3. Ensure you're in the project root
cd /path/to/TinyTorch
```
### **Dataset Management (Automatic)**
**Don't worry about data logistics!** Each example automatically handles dataset downloading:
- **MNIST**: Downloads from official LeCun server (~60MB)
- **CIFAR-10**: Downloads from University of Toronto (~170MB)
- **XOR/Perceptron**: Generates synthetic data instantly
**First run will download data, subsequent runs use cached data.**
### **Running Examples by Module Completion**
#### **📱 Quick Test (No Training)**
Test architecture and imports without waiting for downloads:
```bash
# Test what you've built so far
python examples/perceptron_1957/rosenblatt_perceptron.py --test-only
python examples/xor_1969/minsky_xor_problem.py --test-only
```
#### **🎯 Full Milestone Demonstrations**
```bash
# After Module 04 - Foundation (30 seconds)
python examples/perceptron_1957/rosenblatt_perceptron.py
# Demonstrates: YOU built Linear layers + activation functions
# After Module 06 - Autograd (1 minute)
python examples/xor_1969/minsky_xor_problem.py
# Demonstrates: YOU built gradient computation + training loops
# After Module 08 - Training (2-3 minutes + MNIST download)
python examples/mnist_mlp_1986/train_mlp.py
# Demonstrates: YOU built complete vision pipeline
# After Module 10 - DataLoader + Spatial (3-5 minutes + CIFAR download)
python examples/cifar_cnn_modern/train_cnn.py
# Demonstrates: YOU built convolutional networks
# After Module 14 - Transformers (5-10 minutes)
python examples/gpt_2018/train_gpt.py
# Demonstrates: YOU built attention mechanisms + language models
```
### **🚫 Troubleshooting Common Issues**
#### **Import Errors**
```bash
# If you see "ModuleNotFoundError: No module named 'tinytorch'"
cd /path/to/TinyTorch
python -m pip install -e .
# Or run with explicit path
PYTHONPATH=/path/to/TinyTorch python examples/perceptron_1957/rosenblatt_perceptron.py
```
#### **Dataset Download Issues**
```bash
# Manual dataset download if automatic fails
python examples/data_manager.py # Test all datasets
# Or download specific datasets
python -c "from examples.data_manager import DatasetManager; DatasetManager().get_mnist()"
```
#### **Memory Issues**
```bash
# Reduce batch size for limited memory
python examples/cifar_cnn_modern/train_cnn.py --batch-size 16
# Use test mode for architecture validation only
python examples/mnist_mlp_1986/train_mlp.py --test-only
```
#### **Slow Training**
```bash
# Quick demo mode (reduced epochs)
python examples/mnist_mlp_1986/train_mlp.py --demo-mode
# Use pre-trained weights for instant results
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained
```
### **📊 Expected Performance & Timing**
| Example | Dataset Size | Download Time | Training Time | Expected Accuracy |
|---------|-------------|---------------|---------------|------------------|
| **Perceptron 1957** | 1K synthetic | 0s | 30s | 95%+ (linearly separable) |
| **XOR 1969** | 1K synthetic | 0s | 1min | 90%+ (non-linear) |
| **MNIST MLP 1986** | 60K images | 2-5min | 2-3min | 85%+ (real vision) |
| **CIFAR CNN Modern** | 50K images | 5-10min | 3-5min | 65%+ (natural images) |
| **TinyGPT 2018** | Text corpus | 1-2min | 5-10min | Coherent generation |
**Note**: First run includes dataset download time. Subsequent runs are much faster.
---
## 🤔 **ML Systems Thinking Questions**
### **After Each Milestone, Consider:**
1. **Memory Implications**:
- How much memory does this architecture require?
- What happens when you scale to larger inputs/models?
2. **Computational Complexity**:
- Where are the computational bottlenecks?
- How does training time scale with model size?
3. **Production Deployment**:
- How would you serve this model to millions of users?
- What optimizations would you apply for real-time inference?
4. **Historical Context**:
- Why was this innovation important for the field?
- How does this relate to modern architectures (ResNet, BERT, GPT)?
5. **Engineering Trade-offs**:
- What are the memory vs accuracy trade-offs?
- When would you choose this architecture over alternatives?
---
## 🎓 **Educational Outcomes**
By completing all milestone examples, students will:
### **Technical Mastery**
- ✅ Understand the evolution of neural network architectures
- ✅ Build complete ML systems from scratch using their own implementations
- ✅ Analyze memory and computational trade-offs in different architectures
- ✅ Connect historical innovations to modern production systems
### **Systems Engineering Mindset**
- ✅ Think about scalability and production deployment from day one
- ✅ Understand the engineering decisions that shaped modern ML frameworks
- ✅ Develop intuition for when to use different architectural patterns
- ✅ Build confidence in ML systems engineering roles
### **Real-World Preparation**
- ✅ Experience working with the same patterns used in PyTorch/TensorFlow
- ✅ Understand the systems thinking behind modern ML engineering
- ✅ Develop portfolio projects demonstrating deep technical understanding
- ✅ Build foundation for advanced ML systems engineering roles
---
**Remember**: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.
🚀 **Start your journey**: `python examples/perceptron_1957/rosenblatt_perceptron.py`