Files
TinyTorch/examples

🏆 TinyTorch Milestone Examples

Proof-of-mastery demonstrations showcasing what students can build after completing modules.

These examples demonstrate the evolutionary progression of neural networks from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.


🎯 Milestone Philosophy

Why These Specific Examples?

  1. Historical Progression: Experience the actual evolution of neural networks
  2. Capability Showcasing: Demonstrate specific breakthroughs at each stage
  3. Systems Thinking: Understand WHY each innovation mattered for ML systems
  4. Motivation: See real-world impact of concepts you're learning
  5. Integration: Prove mastery by combining multiple modules into working systems

What Makes This Educational?

  • Not Just Algorithms: Focus on systems engineering and architectural insights
  • Progressive Complexity: Each milestone builds capabilities from previous ones
  • Real Implementations: Use actual TinyTorch modules students built
  • Historical Context: Understand the engineering decisions that shaped modern ML
  • Production Relevance: Connect to how these patterns appear in PyTorch/TensorFlow

📅 Historical Timeline & Module Mapping

🧠 Perceptron 1957 - perceptron_1957/

After Modules 2-4Foundation Building

Input → Linear → Sigmoid → Binary Output

Historical Significance: Frank Rosenblatt's perceptron launched the first AI wave What It Showcases:

  • First trainable neural network
  • Linear classification boundaries
  • Gradient-based learning foundation
  • Why single layers have limitations

Systems Insights:

  • Memory: O(n) parameters, minimal storage
  • Compute: O(n) operations per forward pass
  • Limitations: Only linearly separable problems

Run After: Module 04 (Layers)


XOR Problem 1969 - xor_1969/

After Modules 2-6Breaking Limitations

Input → Linear → ReLU → Linear → Output

Historical Significance: Minsky & Papert showed perceptron limitations; multi-layer networks solved them What It Showcases:

  • Non-linear problem solving
  • Hidden layer representations
  • Why depth enables complexity
  • Foundation for deep learning

Systems Insights:

  • Memory: O(n²) parameters with hidden layers
  • Compute: O(n²) operations, but enables non-linear solutions
  • Architecture: Hidden representations crucial for complex patterns

Run After: Module 06 (Autograd)


🔢 MNIST MLP 1986 - mnist_mlp_1986/

After Modules 2-8Real Vision Problems

Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes

Historical Significance: Backpropagation enabled training deep networks on real datasets What It Showcases:

  • Multi-class classification
  • Real vision datasets
  • Multi-layer feature learning
  • Complete training pipelines

Systems Insights:

  • Memory: ~100K parameters for MNIST (manageable)
  • Compute: Dense matrix operations, vectorization critical
  • Scaling: 95%+ accuracy demonstrates effectiveness

Run After: Module 08 (Training)


🖼️ CIFAR CNN Modern - cifar_cnn_modern/

After Modules 2-10Spatial Understanding

Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes

Historical Significance: CNNs revolutionized computer vision by exploiting spatial structure What It Showcases:

  • Spatial feature extraction
  • Hierarchical pattern recognition
  • Translation invariance
  • Natural image classification

Systems Insights:

  • Memory: ~1M parameters, but shared weights reduce memory vs dense layers
  • Compute: Convolution is compute-intensive but highly parallelizable
  • Architecture: Local connectivity + weight sharing = spatial intelligence

Run After: Module 10 (DataLoader) + Module 09 (Spatial)


🤖 TinyGPT 2018 - gpt_2018/

After Modules 2-14Language Understanding

Tokens → Embeddings → Attention → FFN → ... → Attention → Output

Historical Significance: Transformers + attention revolutionized NLP and launched the LLM era What It Showcases:

  • Sequence modeling
  • Attention mechanisms
  • Autoregressive generation
  • Foundation for ChatGPT/GPT-4

Systems Insights:

  • Memory: O(n²) attention requires careful memory management
  • Compute: Attention is compute-intensive but highly parallelizable
  • Architecture: Self-attention enables long-range dependencies

Run After: Module 14 (Transformers)


🎯 Learning Progression Design

Capability Building Sequence

Stage Capability Unlocked Architectural Innovation Real-World Impact
Stage 1 Binary classification Single-layer networks Basic pattern recognition
Stage 2 Non-linear problems Hidden layers + activation Complex decision boundaries
Stage 3 Multi-class vision Deep feedforward networks Handwritten digit recognition
Stage 4 Spatial understanding Convolutional networks Natural image classification
Stage 5 Sequence modeling Attention mechanisms Language understanding

Systems Engineering Progression

  • Memory Management: From O(n) → O(n²) → O(n²) with optimizations
  • Computational Complexity: Understanding trade-offs between accuracy and efficiency
  • Architectural Patterns: How structure enables capability
  • Production Deployment: What it takes to scale these in practice

🔧 Systems Analysis in Each Example

Each milestone includes:

Memory Profiling

import tracemalloc
tracemalloc.start()
# ... run model ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")

Performance Measurement

# Parameter counting
total_params = sum(p.data.size for p in model.parameters())
print(f"Parameters: {total_params:,}")

# FLOP estimation  
flops = estimate_flops(model, input_shape)
print(f"FLOPs per forward pass: {flops:,}")

Scaling Analysis

# Show how performance scales with model size
for hidden_size in [64, 128, 256, 512]:
    model = create_model(hidden_size)
    time_per_epoch = benchmark_training(model)
    print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")

📂 File Structure

examples/
├── README.md                    # This file - milestone overview
├── perceptron_1957/
│   └── rosenblatt_perceptron.py # First trainable neural network
├── xor_1969/
│   └── minsky_xor_problem.py    # Non-linear problem solving
├── mnist_mlp_1986/
│   └── train_mlp.py             # Real vision with multi-layer networks
├── cifar_cnn_modern/
│   ├── train_cnn.py             # Spatial feature extraction with CNNs
│   └── data/                    # CIFAR-10 dataset
├── gpt_2018/
│   └── train_gpt.py             # Language modeling with transformers
└── pretrained/
    ├── mnist_mlp_weights.npz    # Pre-trained weights for quick demos
    ├── cifar10_cnn_weights.npz
    └── xor_weights.npz

🚀 How to Run These Examples

Prerequisites Check

# 1. Verify your TinyTorch installation
tito system doctor

# 2. Check which modules you've completed  
tito checkpoint status

# 3. Ensure you're in the project root
cd /path/to/TinyTorch

Dataset Management (Automatic)

Don't worry about data logistics! Each example automatically handles dataset downloading:

  • MNIST: Downloads from official LeCun server (~60MB)
  • CIFAR-10: Downloads from University of Toronto (~170MB)
  • XOR/Perceptron: Generates synthetic data instantly

First run will download data, subsequent runs use cached data.

Running Examples by Module Completion

📱 Quick Test (No Training)

Test architecture and imports without waiting for downloads:

# Test what you've built so far
python examples/perceptron_1957/rosenblatt_perceptron.py --test-only
python examples/xor_1969/minsky_xor_problem.py --test-only

🎯 Full Milestone Demonstrations

# After Module 04 - Foundation (30 seconds)
python examples/perceptron_1957/rosenblatt_perceptron.py
# Demonstrates: YOU built Linear layers + activation functions

# After Module 06 - Autograd (1 minute)  
python examples/xor_1969/minsky_xor_problem.py
# Demonstrates: YOU built gradient computation + training loops

# After Module 08 - Training (2-3 minutes + MNIST download)
python examples/mnist_mlp_1986/train_mlp.py
# Demonstrates: YOU built complete vision pipeline

# After Module 10 - DataLoader + Spatial (3-5 minutes + CIFAR download)
python examples/cifar_cnn_modern/train_cnn.py  
# Demonstrates: YOU built convolutional networks

# After Module 14 - Transformers (5-10 minutes)
python examples/gpt_2018/train_gpt.py
# Demonstrates: YOU built attention mechanisms + language models

🚫 Troubleshooting Common Issues

Import Errors

# If you see "ModuleNotFoundError: No module named 'tinytorch'"
cd /path/to/TinyTorch
python -m pip install -e .

# Or run with explicit path
PYTHONPATH=/path/to/TinyTorch python examples/perceptron_1957/rosenblatt_perceptron.py

Dataset Download Issues

# Manual dataset download if automatic fails
python examples/data_manager.py  # Test all datasets

# Or download specific datasets
python -c "from examples.data_manager import DatasetManager; DatasetManager().get_mnist()"

Memory Issues

# Reduce batch size for limited memory
python examples/cifar_cnn_modern/train_cnn.py --batch-size 16

# Use test mode for architecture validation only
python examples/mnist_mlp_1986/train_mlp.py --test-only

Slow Training

# Quick demo mode (reduced epochs)
python examples/mnist_mlp_1986/train_mlp.py --demo-mode

# Use pre-trained weights for instant results
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained

📊 Expected Performance & Timing

Example Dataset Size Download Time Training Time Expected Accuracy
Perceptron 1957 1K synthetic 0s 30s 95%+ (linearly separable)
XOR 1969 1K synthetic 0s 1min 90%+ (non-linear)
MNIST MLP 1986 60K images 2-5min 2-3min 85%+ (real vision)
CIFAR CNN Modern 50K images 5-10min 3-5min 65%+ (natural images)
TinyGPT 2018 Text corpus 1-2min 5-10min Coherent generation

Note: First run includes dataset download time. Subsequent runs are much faster.


🤔 ML Systems Thinking Questions

After Each Milestone, Consider:

  1. Memory Implications:

    • How much memory does this architecture require?
    • What happens when you scale to larger inputs/models?
  2. Computational Complexity:

    • Where are the computational bottlenecks?
    • How does training time scale with model size?
  3. Production Deployment:

    • How would you serve this model to millions of users?
    • What optimizations would you apply for real-time inference?
  4. Historical Context:

    • Why was this innovation important for the field?
    • How does this relate to modern architectures (ResNet, BERT, GPT)?
  5. Engineering Trade-offs:

    • What are the memory vs accuracy trade-offs?
    • When would you choose this architecture over alternatives?

🎓 Educational Outcomes

By completing all milestone examples, students will:

Technical Mastery

  • Understand the evolution of neural network architectures
  • Build complete ML systems from scratch using their own implementations
  • Analyze memory and computational trade-offs in different architectures
  • Connect historical innovations to modern production systems

Systems Engineering Mindset

  • Think about scalability and production deployment from day one
  • Understand the engineering decisions that shaped modern ML frameworks
  • Develop intuition for when to use different architectural patterns
  • Build confidence in ML systems engineering roles

Real-World Preparation

  • Experience working with the same patterns used in PyTorch/TensorFlow
  • Understand the systems thinking behind modern ML engineering
  • Develop portfolio projects demonstrating deep technical understanding
  • Build foundation for advanced ML systems engineering roles

Remember: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.

🚀 Start your journey: python examples/perceptron_1957/rosenblatt_perceptron.py