mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-09 14:51:44 -05:00
Address practical concerns about running milestone examples: DATASET MANAGEMENT: - Add data_manager.py for automatic dataset downloading - Support MNIST, CIFAR-10, XOR, and Perceptron datasets - Handle download with progress bars and caching - Clear error handling and fallback options STANDARDIZED TEMPLATE: - Create MILESTONE_TEMPLATE.py showing standard structure - Emphasize "YOU BUILT THIS" throughout code comments - Include historical context and educational rationale - Add systems analysis (memory, performance, scaling) - Clear module prerequisite mapping RUNNING INSTRUCTIONS: - Comprehensive troubleshooting section in README - Performance expectations and timing estimates - Command-line options (--test-only, --demo-mode) - Clear dataset logistics explanation EXAMPLE IMPLEMENTATION: - Update perceptron_1957 to follow new template - Demonstrate "YOUR TinyTorch" emphasis throughout - Show proper dataset integration and systems analysis - Include command-line interface for different modes Students now have clear, practical milestone examples that: - Handle all dataset logistics automatically - Emphasize their own implementations throughout - Provide historical context and educational value - Include troubleshooting and performance guidance
388 lines
13 KiB
Markdown
388 lines
13 KiB
Markdown
# 🏆 TinyTorch Milestone Examples
|
|
|
|
**Proof-of-mastery demonstrations showcasing what students can build after completing modules.**
|
|
|
|
These examples demonstrate the **evolutionary progression of neural networks** from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.
|
|
|
|
---
|
|
|
|
## 🎯 **Milestone Philosophy**
|
|
|
|
### **Why These Specific Examples?**
|
|
|
|
1. **Historical Progression**: Experience the actual evolution of neural networks
|
|
2. **Capability Showcasing**: Demonstrate specific breakthroughs at each stage
|
|
3. **Systems Thinking**: Understand WHY each innovation mattered for ML systems
|
|
4. **Motivation**: See real-world impact of concepts you're learning
|
|
5. **Integration**: Prove mastery by combining multiple modules into working systems
|
|
|
|
### **What Makes This Educational?**
|
|
|
|
- **Not Just Algorithms**: Focus on systems engineering and architectural insights
|
|
- **Progressive Complexity**: Each milestone builds capabilities from previous ones
|
|
- **Real Implementations**: Use actual TinyTorch modules students built
|
|
- **Historical Context**: Understand the engineering decisions that shaped modern ML
|
|
- **Production Relevance**: Connect to how these patterns appear in PyTorch/TensorFlow
|
|
|
|
---
|
|
|
|
## 📅 **Historical Timeline & Module Mapping**
|
|
|
|
### **🧠 Perceptron 1957** - `perceptron_1957/`
|
|
**After Modules 2-4** • *Foundation Building*
|
|
|
|
```
|
|
Input → Linear → Sigmoid → Binary Output
|
|
```
|
|
|
|
**Historical Significance**: Frank Rosenblatt's perceptron launched the first AI wave
|
|
**What It Showcases**:
|
|
- First trainable neural network
|
|
- Linear classification boundaries
|
|
- Gradient-based learning foundation
|
|
- Why single layers have limitations
|
|
|
|
**Systems Insights**:
|
|
- Memory: O(n) parameters, minimal storage
|
|
- Compute: O(n) operations per forward pass
|
|
- Limitations: Only linearly separable problems
|
|
|
|
**Run After**: Module 04 (Layers) ✅
|
|
|
|
---
|
|
|
|
### **⚡ XOR Problem 1969** - `xor_1969/`
|
|
**After Modules 2-6** • *Breaking Limitations*
|
|
|
|
```
|
|
Input → Linear → ReLU → Linear → Output
|
|
```
|
|
|
|
**Historical Significance**: Minsky & Papert showed perceptron limitations; multi-layer networks solved them
|
|
**What It Showcases**:
|
|
- Non-linear problem solving
|
|
- Hidden layer representations
|
|
- Why depth enables complexity
|
|
- Foundation for deep learning
|
|
|
|
**Systems Insights**:
|
|
- Memory: O(n²) parameters with hidden layers
|
|
- Compute: O(n²) operations, but enables non-linear solutions
|
|
- Architecture: Hidden representations crucial for complex patterns
|
|
|
|
**Run After**: Module 06 (Autograd) ✅
|
|
|
|
---
|
|
|
|
### **🔢 MNIST MLP 1986** - `mnist_mlp_1986/`
|
|
**After Modules 2-8** • *Real Vision Problems*
|
|
|
|
```
|
|
Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes
|
|
```
|
|
|
|
**Historical Significance**: Backpropagation enabled training deep networks on real datasets
|
|
**What It Showcases**:
|
|
- Multi-class classification
|
|
- Real vision datasets
|
|
- Multi-layer feature learning
|
|
- Complete training pipelines
|
|
|
|
**Systems Insights**:
|
|
- Memory: ~100K parameters for MNIST (manageable)
|
|
- Compute: Dense matrix operations, vectorization critical
|
|
- Scaling: 95%+ accuracy demonstrates effectiveness
|
|
|
|
**Run After**: Module 08 (Training) ✅
|
|
|
|
---
|
|
|
|
### **🖼️ CIFAR CNN Modern** - `cifar_cnn_modern/`
|
|
**After Modules 2-10** • *Spatial Understanding*
|
|
|
|
```
|
|
Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes
|
|
```
|
|
|
|
**Historical Significance**: CNNs revolutionized computer vision by exploiting spatial structure
|
|
**What It Showcases**:
|
|
- Spatial feature extraction
|
|
- Hierarchical pattern recognition
|
|
- Translation invariance
|
|
- Natural image classification
|
|
|
|
**Systems Insights**:
|
|
- Memory: ~1M parameters, but shared weights reduce memory vs dense layers
|
|
- Compute: Convolution is compute-intensive but highly parallelizable
|
|
- Architecture: Local connectivity + weight sharing = spatial intelligence
|
|
|
|
**Run After**: Module 10 (DataLoader) + Module 09 (Spatial) ✅
|
|
|
|
---
|
|
|
|
### **🤖 TinyGPT 2018** - `gpt_2018/`
|
|
**After Modules 2-14** • *Language Understanding*
|
|
|
|
```
|
|
Tokens → Embeddings → Attention → FFN → ... → Attention → Output
|
|
```
|
|
|
|
**Historical Significance**: Transformers + attention revolutionized NLP and launched the LLM era
|
|
**What It Showcases**:
|
|
- Sequence modeling
|
|
- Attention mechanisms
|
|
- Autoregressive generation
|
|
- Foundation for ChatGPT/GPT-4
|
|
|
|
**Systems Insights**:
|
|
- Memory: O(n²) attention requires careful memory management
|
|
- Compute: Attention is compute-intensive but highly parallelizable
|
|
- Architecture: Self-attention enables long-range dependencies
|
|
|
|
**Run After**: Module 14 (Transformers) ✅
|
|
|
|
---
|
|
|
|
## 🎯 **Learning Progression Design**
|
|
|
|
### **Capability Building Sequence**
|
|
|
|
| Stage | Capability Unlocked | Architectural Innovation | Real-World Impact |
|
|
|-------|-------------------|------------------------|------------------|
|
|
| **Stage 1** | Binary classification | Single-layer networks | Basic pattern recognition |
|
|
| **Stage 2** | Non-linear problems | Hidden layers + activation | Complex decision boundaries |
|
|
| **Stage 3** | Multi-class vision | Deep feedforward networks | Handwritten digit recognition |
|
|
| **Stage 4** | Spatial understanding | Convolutional networks | Natural image classification |
|
|
| **Stage 5** | Sequence modeling | Attention mechanisms | Language understanding |
|
|
|
|
### **Systems Engineering Progression**
|
|
|
|
- **Memory Management**: From O(n) → O(n²) → O(n²) with optimizations
|
|
- **Computational Complexity**: Understanding trade-offs between accuracy and efficiency
|
|
- **Architectural Patterns**: How structure enables capability
|
|
- **Production Deployment**: What it takes to scale these in practice
|
|
|
|
---
|
|
|
|
## 🔧 **Systems Analysis in Each Example**
|
|
|
|
Each milestone includes:
|
|
|
|
### **Memory Profiling**
|
|
```python
|
|
import tracemalloc
|
|
tracemalloc.start()
|
|
# ... run model ...
|
|
current, peak = tracemalloc.get_traced_memory()
|
|
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
|
|
```
|
|
|
|
### **Performance Measurement**
|
|
```python
|
|
# Parameter counting
|
|
total_params = sum(p.data.size for p in model.parameters())
|
|
print(f"Parameters: {total_params:,}")
|
|
|
|
# FLOP estimation
|
|
flops = estimate_flops(model, input_shape)
|
|
print(f"FLOPs per forward pass: {flops:,}")
|
|
```
|
|
|
|
### **Scaling Analysis**
|
|
```python
|
|
# Show how performance scales with model size
|
|
for hidden_size in [64, 128, 256, 512]:
|
|
model = create_model(hidden_size)
|
|
time_per_epoch = benchmark_training(model)
|
|
print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")
|
|
```
|
|
|
|
---
|
|
|
|
## 📂 **File Structure**
|
|
|
|
```
|
|
examples/
|
|
├── README.md # This file - milestone overview
|
|
├── perceptron_1957/
|
|
│ └── rosenblatt_perceptron.py # First trainable neural network
|
|
├── xor_1969/
|
|
│ └── minsky_xor_problem.py # Non-linear problem solving
|
|
├── mnist_mlp_1986/
|
|
│ └── train_mlp.py # Real vision with multi-layer networks
|
|
├── cifar_cnn_modern/
|
|
│ ├── train_cnn.py # Spatial feature extraction with CNNs
|
|
│ └── data/ # CIFAR-10 dataset
|
|
├── gpt_2018/
|
|
│ └── train_gpt.py # Language modeling with transformers
|
|
└── pretrained/
|
|
├── mnist_mlp_weights.npz # Pre-trained weights for quick demos
|
|
├── cifar10_cnn_weights.npz
|
|
└── xor_weights.npz
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 **How to Run These Examples**
|
|
|
|
### **Prerequisites Check**
|
|
```bash
|
|
# 1. Verify your TinyTorch installation
|
|
tito system doctor
|
|
|
|
# 2. Check which modules you've completed
|
|
tito checkpoint status
|
|
|
|
# 3. Ensure you're in the project root
|
|
cd /path/to/TinyTorch
|
|
```
|
|
|
|
### **Dataset Management (Automatic)**
|
|
**Don't worry about data logistics!** Each example automatically handles dataset downloading:
|
|
|
|
- **MNIST**: Downloads from official LeCun server (~60MB)
|
|
- **CIFAR-10**: Downloads from University of Toronto (~170MB)
|
|
- **XOR/Perceptron**: Generates synthetic data instantly
|
|
|
|
**First run will download data, subsequent runs use cached data.**
|
|
|
|
### **Running Examples by Module Completion**
|
|
|
|
#### **📱 Quick Test (No Training)**
|
|
Test architecture and imports without waiting for downloads:
|
|
```bash
|
|
# Test what you've built so far
|
|
python examples/perceptron_1957/rosenblatt_perceptron.py --test-only
|
|
python examples/xor_1969/minsky_xor_problem.py --test-only
|
|
```
|
|
|
|
#### **🎯 Full Milestone Demonstrations**
|
|
|
|
```bash
|
|
# After Module 04 - Foundation (30 seconds)
|
|
python examples/perceptron_1957/rosenblatt_perceptron.py
|
|
# Demonstrates: YOU built Linear layers + activation functions
|
|
|
|
# After Module 06 - Autograd (1 minute)
|
|
python examples/xor_1969/minsky_xor_problem.py
|
|
# Demonstrates: YOU built gradient computation + training loops
|
|
|
|
# After Module 08 - Training (2-3 minutes + MNIST download)
|
|
python examples/mnist_mlp_1986/train_mlp.py
|
|
# Demonstrates: YOU built complete vision pipeline
|
|
|
|
# After Module 10 - DataLoader + Spatial (3-5 minutes + CIFAR download)
|
|
python examples/cifar_cnn_modern/train_cnn.py
|
|
# Demonstrates: YOU built convolutional networks
|
|
|
|
# After Module 14 - Transformers (5-10 minutes)
|
|
python examples/gpt_2018/train_gpt.py
|
|
# Demonstrates: YOU built attention mechanisms + language models
|
|
```
|
|
|
|
### **🚫 Troubleshooting Common Issues**
|
|
|
|
#### **Import Errors**
|
|
```bash
|
|
# If you see "ModuleNotFoundError: No module named 'tinytorch'"
|
|
cd /path/to/TinyTorch
|
|
python -m pip install -e .
|
|
|
|
# Or run with explicit path
|
|
PYTHONPATH=/path/to/TinyTorch python examples/perceptron_1957/rosenblatt_perceptron.py
|
|
```
|
|
|
|
#### **Dataset Download Issues**
|
|
```bash
|
|
# Manual dataset download if automatic fails
|
|
python examples/data_manager.py # Test all datasets
|
|
|
|
# Or download specific datasets
|
|
python -c "from examples.data_manager import DatasetManager; DatasetManager().get_mnist()"
|
|
```
|
|
|
|
#### **Memory Issues**
|
|
```bash
|
|
# Reduce batch size for limited memory
|
|
python examples/cifar_cnn_modern/train_cnn.py --batch-size 16
|
|
|
|
# Use test mode for architecture validation only
|
|
python examples/mnist_mlp_1986/train_mlp.py --test-only
|
|
```
|
|
|
|
#### **Slow Training**
|
|
```bash
|
|
# Quick demo mode (reduced epochs)
|
|
python examples/mnist_mlp_1986/train_mlp.py --demo-mode
|
|
|
|
# Use pre-trained weights for instant results
|
|
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained
|
|
```
|
|
|
|
### **📊 Expected Performance & Timing**
|
|
|
|
| Example | Dataset Size | Download Time | Training Time | Expected Accuracy |
|
|
|---------|-------------|---------------|---------------|------------------|
|
|
| **Perceptron 1957** | 1K synthetic | 0s | 30s | 95%+ (linearly separable) |
|
|
| **XOR 1969** | 1K synthetic | 0s | 1min | 90%+ (non-linear) |
|
|
| **MNIST MLP 1986** | 60K images | 2-5min | 2-3min | 85%+ (real vision) |
|
|
| **CIFAR CNN Modern** | 50K images | 5-10min | 3-5min | 65%+ (natural images) |
|
|
| **TinyGPT 2018** | Text corpus | 1-2min | 5-10min | Coherent generation |
|
|
|
|
**Note**: First run includes dataset download time. Subsequent runs are much faster.
|
|
|
|
---
|
|
|
|
## 🤔 **ML Systems Thinking Questions**
|
|
|
|
### **After Each Milestone, Consider:**
|
|
|
|
1. **Memory Implications**:
|
|
- How much memory does this architecture require?
|
|
- What happens when you scale to larger inputs/models?
|
|
|
|
2. **Computational Complexity**:
|
|
- Where are the computational bottlenecks?
|
|
- How does training time scale with model size?
|
|
|
|
3. **Production Deployment**:
|
|
- How would you serve this model to millions of users?
|
|
- What optimizations would you apply for real-time inference?
|
|
|
|
4. **Historical Context**:
|
|
- Why was this innovation important for the field?
|
|
- How does this relate to modern architectures (ResNet, BERT, GPT)?
|
|
|
|
5. **Engineering Trade-offs**:
|
|
- What are the memory vs accuracy trade-offs?
|
|
- When would you choose this architecture over alternatives?
|
|
|
|
---
|
|
|
|
## 🎓 **Educational Outcomes**
|
|
|
|
By completing all milestone examples, students will:
|
|
|
|
### **Technical Mastery**
|
|
- ✅ Understand the evolution of neural network architectures
|
|
- ✅ Build complete ML systems from scratch using their own implementations
|
|
- ✅ Analyze memory and computational trade-offs in different architectures
|
|
- ✅ Connect historical innovations to modern production systems
|
|
|
|
### **Systems Engineering Mindset**
|
|
- ✅ Think about scalability and production deployment from day one
|
|
- ✅ Understand the engineering decisions that shaped modern ML frameworks
|
|
- ✅ Develop intuition for when to use different architectural patterns
|
|
- ✅ Build confidence in ML systems engineering roles
|
|
|
|
### **Real-World Preparation**
|
|
- ✅ Experience working with the same patterns used in PyTorch/TensorFlow
|
|
- ✅ Understand the systems thinking behind modern ML engineering
|
|
- ✅ Develop portfolio projects demonstrating deep technical understanding
|
|
- ✅ Build foundation for advanced ML systems engineering roles
|
|
|
|
---
|
|
|
|
**Remember**: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.
|
|
|
|
🚀 **Start your journey**: `python examples/perceptron_1957/rosenblatt_perceptron.py` |