# šŸ† TinyTorch Milestone Examples **Proof-of-mastery demonstrations showcasing what students can build after completing modules.** These examples demonstrate the **evolutionary progression of neural networks** from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI. --- ## šŸŽÆ **Milestone Philosophy** ### **Why These Specific Examples?** 1. **Historical Progression**: Experience the actual evolution of neural networks 2. **Capability Showcasing**: Demonstrate specific breakthroughs at each stage 3. **Systems Thinking**: Understand WHY each innovation mattered for ML systems 4. **Motivation**: See real-world impact of concepts you're learning 5. **Integration**: Prove mastery by combining multiple modules into working systems ### **What Makes This Educational?** - **Not Just Algorithms**: Focus on systems engineering and architectural insights - **Progressive Complexity**: Each milestone builds capabilities from previous ones - **Real Implementations**: Use actual TinyTorch modules students built - **Historical Context**: Understand the engineering decisions that shaped modern ML - **Production Relevance**: Connect to how these patterns appear in PyTorch/TensorFlow --- ## šŸ“… **Historical Timeline & Module Mapping** ### **🧠 Perceptron 1957** - `perceptron_1957/` **After Modules 2-4** • *Foundation Building* ``` Input → Linear → Sigmoid → Binary Output ``` **Historical Significance**: Frank Rosenblatt's perceptron launched the first AI wave **What It Showcases**: - First trainable neural network - Linear classification boundaries - Gradient-based learning foundation - Why single layers have limitations **Systems Insights**: - Memory: O(n) parameters, minimal storage - Compute: O(n) operations per forward pass - Limitations: Only linearly separable problems **Run After**: Module 04 (Layers) āœ… --- ### **⚔ XOR Problem 1969** - `xor_1969/` **After Modules 2-6** • *Breaking Limitations* ``` Input → Linear → ReLU → Linear → Output ``` **Historical Significance**: Minsky & Papert showed perceptron limitations; multi-layer networks solved them **What It Showcases**: - Non-linear problem solving - Hidden layer representations - Why depth enables complexity - Foundation for deep learning **Systems Insights**: - Memory: O(n²) parameters with hidden layers - Compute: O(n²) operations, but enables non-linear solutions - Architecture: Hidden representations crucial for complex patterns **Run After**: Module 06 (Autograd) āœ… --- ### **šŸ”¢ MNIST MLP 1986** - `mnist_mlp_1986/` **After Modules 2-8** • *Real Vision Problems* ``` Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes ``` **Historical Significance**: Backpropagation enabled training deep networks on real datasets **What It Showcases**: - Multi-class classification - Real vision datasets - Multi-layer feature learning - Complete training pipelines **Systems Insights**: - Memory: ~100K parameters for MNIST (manageable) - Compute: Dense matrix operations, vectorization critical - Scaling: 95%+ accuracy demonstrates effectiveness **Run After**: Module 08 (Training) āœ… --- ### **šŸ–¼ļø CIFAR CNN Modern** - `cifar_cnn_modern/` **After Modules 2-10** • *Spatial Understanding* ``` Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes ``` **Historical Significance**: CNNs revolutionized computer vision by exploiting spatial structure **What It Showcases**: - Spatial feature extraction - Hierarchical pattern recognition - Translation invariance - Natural image classification **Systems Insights**: - Memory: ~1M parameters, but shared weights reduce memory vs dense layers - Compute: Convolution is compute-intensive but highly parallelizable - Architecture: Local connectivity + weight sharing = spatial intelligence **Run After**: Module 10 (DataLoader) + Module 09 (Spatial) āœ… --- ### **šŸ¤– TinyGPT 2018** - `gpt_2018/` **After Modules 2-14** • *Language Understanding* ``` Tokens → Embeddings → Attention → FFN → ... → Attention → Output ``` **Historical Significance**: Transformers + attention revolutionized NLP and launched the LLM era **What It Showcases**: - Sequence modeling - Attention mechanisms - Autoregressive generation - Foundation for ChatGPT/GPT-4 **Systems Insights**: - Memory: O(n²) attention requires careful memory management - Compute: Attention is compute-intensive but highly parallelizable - Architecture: Self-attention enables long-range dependencies **Run After**: Module 14 (Transformers) āœ… --- ## šŸŽÆ **Learning Progression Design** ### **Capability Building Sequence** | Stage | Capability Unlocked | Architectural Innovation | Real-World Impact | |-------|-------------------|------------------------|------------------| | **Stage 1** | Binary classification | Single-layer networks | Basic pattern recognition | | **Stage 2** | Non-linear problems | Hidden layers + activation | Complex decision boundaries | | **Stage 3** | Multi-class vision | Deep feedforward networks | Handwritten digit recognition | | **Stage 4** | Spatial understanding | Convolutional networks | Natural image classification | | **Stage 5** | Sequence modeling | Attention mechanisms | Language understanding | ### **Systems Engineering Progression** - **Memory Management**: From O(n) → O(n²) → O(n²) with optimizations - **Computational Complexity**: Understanding trade-offs between accuracy and efficiency - **Architectural Patterns**: How structure enables capability - **Production Deployment**: What it takes to scale these in practice --- ## šŸ”§ **Systems Analysis in Each Example** Each milestone includes: ### **Memory Profiling** ```python import tracemalloc tracemalloc.start() # ... run model ... current, peak = tracemalloc.get_traced_memory() print(f"Peak memory: {peak / 1024 / 1024:.2f} MB") ``` ### **Performance Measurement** ```python # Parameter counting total_params = sum(p.data.size for p in model.parameters()) print(f"Parameters: {total_params:,}") # FLOP estimation flops = estimate_flops(model, input_shape) print(f"FLOPs per forward pass: {flops:,}") ``` ### **Scaling Analysis** ```python # Show how performance scales with model size for hidden_size in [64, 128, 256, 512]: model = create_model(hidden_size) time_per_epoch = benchmark_training(model) print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch") ``` --- ## šŸ“‚ **File Structure** ``` examples/ ā”œā”€ā”€ README.md # This file - milestone overview ā”œā”€ā”€ perceptron_1957/ │ └── rosenblatt_perceptron.py # First trainable neural network ā”œā”€ā”€ xor_1969/ │ └── minsky_xor_problem.py # Non-linear problem solving ā”œā”€ā”€ mnist_mlp_1986/ │ └── train_mlp.py # Real vision with multi-layer networks ā”œā”€ā”€ cifar_cnn_modern/ │ ā”œā”€ā”€ train_cnn.py # Spatial feature extraction with CNNs │ └── data/ # CIFAR-10 dataset ā”œā”€ā”€ gpt_2018/ │ └── train_gpt.py # Language modeling with transformers └── pretrained/ ā”œā”€ā”€ mnist_mlp_weights.npz # Pre-trained weights for quick demos ā”œā”€ā”€ cifar10_cnn_weights.npz └── xor_weights.npz ``` --- ## šŸš€ **How to Run These Examples** ### **Prerequisites Check** ```bash # 1. Verify your TinyTorch installation tito system doctor # 2. Check which modules you've completed tito checkpoint status # 3. Ensure you're in the project root cd /path/to/TinyTorch ``` ### **Dataset Management (Automatic)** **Don't worry about data logistics!** Each example automatically handles dataset downloading: - **MNIST**: Downloads from official LeCun server (~60MB) - **CIFAR-10**: Downloads from University of Toronto (~170MB) - **XOR/Perceptron**: Generates synthetic data instantly **First run will download data, subsequent runs use cached data.** ### **Running Examples by Module Completion** #### **šŸ“± Quick Test (No Training)** Test architecture and imports without waiting for downloads: ```bash # Test what you've built so far python examples/perceptron_1957/rosenblatt_perceptron.py --test-only python examples/xor_1969/minsky_xor_problem.py --test-only ``` #### **šŸŽÆ Full Milestone Demonstrations** ```bash # After Module 04 - Foundation (30 seconds) python examples/perceptron_1957/rosenblatt_perceptron.py # Demonstrates: YOU built Linear layers + activation functions # After Module 06 - Autograd (1 minute) python examples/xor_1969/minsky_xor_problem.py # Demonstrates: YOU built gradient computation + training loops # After Module 08 - Training (2-3 minutes + MNIST download) python examples/mnist_mlp_1986/train_mlp.py # Demonstrates: YOU built complete vision pipeline # After Module 10 - DataLoader + Spatial (3-5 minutes + CIFAR download) python examples/cifar_cnn_modern/train_cnn.py # Demonstrates: YOU built convolutional networks # After Module 14 - Transformers (5-10 minutes) python examples/gpt_2018/train_gpt.py # Demonstrates: YOU built attention mechanisms + language models ``` ### **🚫 Troubleshooting Common Issues** #### **Import Errors** ```bash # If you see "ModuleNotFoundError: No module named 'tinytorch'" cd /path/to/TinyTorch python -m pip install -e . # Or run with explicit path PYTHONPATH=/path/to/TinyTorch python examples/perceptron_1957/rosenblatt_perceptron.py ``` #### **Dataset Download Issues** ```bash # Manual dataset download if automatic fails python examples/data_manager.py # Test all datasets # Or download specific datasets python -c "from examples.data_manager import DatasetManager; DatasetManager().get_mnist()" ``` #### **Memory Issues** ```bash # Reduce batch size for limited memory python examples/cifar_cnn_modern/train_cnn.py --batch-size 16 # Use test mode for architecture validation only python examples/mnist_mlp_1986/train_mlp.py --test-only ``` #### **Slow Training** ```bash # Quick demo mode (reduced epochs) python examples/mnist_mlp_1986/train_mlp.py --demo-mode # Use pre-trained weights for instant results python examples/mnist_mlp_1986/train_mlp.py --use-pretrained ``` ### **šŸ“Š Expected Performance & Timing** | Example | Dataset Size | Download Time | Training Time | Expected Accuracy | |---------|-------------|---------------|---------------|------------------| | **Perceptron 1957** | 1K synthetic | 0s | 30s | 95%+ (linearly separable) | | **XOR 1969** | 1K synthetic | 0s | 1min | 90%+ (non-linear) | | **MNIST MLP 1986** | 60K images | 2-5min | 2-3min | 85%+ (real vision) | | **CIFAR CNN Modern** | 50K images | 5-10min | 3-5min | 65%+ (natural images) | | **TinyGPT 2018** | Text corpus | 1-2min | 5-10min | Coherent generation | **Note**: First run includes dataset download time. Subsequent runs are much faster. --- ## šŸ¤” **ML Systems Thinking Questions** ### **After Each Milestone, Consider:** 1. **Memory Implications**: - How much memory does this architecture require? - What happens when you scale to larger inputs/models? 2. **Computational Complexity**: - Where are the computational bottlenecks? - How does training time scale with model size? 3. **Production Deployment**: - How would you serve this model to millions of users? - What optimizations would you apply for real-time inference? 4. **Historical Context**: - Why was this innovation important for the field? - How does this relate to modern architectures (ResNet, BERT, GPT)? 5. **Engineering Trade-offs**: - What are the memory vs accuracy trade-offs? - When would you choose this architecture over alternatives? --- ## šŸŽ“ **Educational Outcomes** By completing all milestone examples, students will: ### **Technical Mastery** - āœ… Understand the evolution of neural network architectures - āœ… Build complete ML systems from scratch using their own implementations - āœ… Analyze memory and computational trade-offs in different architectures - āœ… Connect historical innovations to modern production systems ### **Systems Engineering Mindset** - āœ… Think about scalability and production deployment from day one - āœ… Understand the engineering decisions that shaped modern ML frameworks - āœ… Develop intuition for when to use different architectural patterns - āœ… Build confidence in ML systems engineering roles ### **Real-World Preparation** - āœ… Experience working with the same patterns used in PyTorch/TensorFlow - āœ… Understand the systems thinking behind modern ML engineering - āœ… Develop portfolio projects demonstrating deep technical understanding - āœ… Build foundation for advanced ML systems engineering roles --- **Remember**: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering. šŸš€ **Start your journey**: `python examples/perceptron_1957/rosenblatt_perceptron.py`