mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-12 07:53:34 -05:00

Files

Vijay Janapa Reddi a2c24ee894 Optimization Level 0: Baseline

Results:
- Perceptron: ✅ (1.76s) 100.0%
- XOR: ✅ (1.88s) 54.5%
- MNIST: ✅ (1.89s) 9.0%
- CIFAR: ❌ (3.85s)
- TinyGPT: ✅ (1.84s)

2025-09-28 22:03:36 -04:00

cifar_cnn_modern

Optimization Level 0: Baseline

2025-09-28 22:03:36 -04:00

gpt_2018

Clean up test files

2025-09-28 20:10:11 -04:00

mnist_mlp_1986

Add comprehensive training infrastructure with validation and monitoring

2025-09-28 21:24:42 -04:00

perceptron_1957

Fix gradient flow in examples: Maintain computational graph

2025-09-28 20:09:48 -04:00

xor_1969

Add comprehensive training infrastructure with validation and monitoring

2025-09-28 21:24:42 -04:00

data_manager.py

IMPROVE: Make milestone examples self-contained with clear dataset handling

2025-09-26 13:53:06 -04:00

MILESTONE_TEMPLATE.py

LOGISTICS: Add comprehensive milestone example infrastructure

2025-09-26 13:00:48 -04:00

README.md

LOGISTICS: Add comprehensive milestone example infrastructure

2025-09-26 13:00:48 -04:00

utils.py

Add comprehensive training infrastructure with validation and monitoring

2025-09-28 21:24:42 -04:00

README.md

🏆 TinyTorch Milestone Examples

Proof-of-mastery demonstrations showcasing what students can build after completing modules.

These examples demonstrate the evolutionary progression of neural networks from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.

🎯 Milestone Philosophy

Why These Specific Examples?

Historical Progression: Experience the actual evolution of neural networks
Capability Showcasing: Demonstrate specific breakthroughs at each stage
Systems Thinking: Understand WHY each innovation mattered for ML systems
Motivation: See real-world impact of concepts you're learning
Integration: Prove mastery by combining multiple modules into working systems

What Makes This Educational?

Not Just Algorithms: Focus on systems engineering and architectural insights
Progressive Complexity: Each milestone builds capabilities from previous ones
Real Implementations: Use actual TinyTorch modules students built
Historical Context: Understand the engineering decisions that shaped modern ML
Production Relevance: Connect to how these patterns appear in PyTorch/TensorFlow

📅 Historical Timeline & Module Mapping

🧠 Perceptron 1957 - `perceptron_1957/`

After Modules 2-4 • Foundation Building

Input → Linear → Sigmoid → Binary Output

Historical Significance: Frank Rosenblatt's perceptron launched the first AI wave What It Showcases:

First trainable neural network
Linear classification boundaries
Gradient-based learning foundation
Why single layers have limitations

Systems Insights:

Memory: O(n) parameters, minimal storage
Compute: O(n) operations per forward pass
Limitations: Only linearly separable problems

Run After: Module 04 (Layers) ✅

⚡ XOR Problem 1969 - `xor_1969/`

After Modules 2-6 • Breaking Limitations

Input → Linear → ReLU → Linear → Output

Historical Significance: Minsky & Papert showed perceptron limitations; multi-layer networks solved them What It Showcases:

Non-linear problem solving
Hidden layer representations
Why depth enables complexity
Foundation for deep learning

Systems Insights:

Memory: O(n²) parameters with hidden layers
Compute: O(n²) operations, but enables non-linear solutions
Architecture: Hidden representations crucial for complex patterns

Run After: Module 06 (Autograd) ✅

🔢 MNIST MLP 1986 - `mnist_mlp_1986/`

After Modules 2-8 • Real Vision Problems

Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes

Historical Significance: Backpropagation enabled training deep networks on real datasets What It Showcases:

Multi-class classification
Real vision datasets
Multi-layer feature learning
Complete training pipelines

Systems Insights:

Memory: ~100K parameters for MNIST (manageable)
Compute: Dense matrix operations, vectorization critical
Scaling: 95%+ accuracy demonstrates effectiveness

Run After: Module 08 (Training) ✅

🖼️ CIFAR CNN Modern - `cifar_cnn_modern/`

After Modules 2-10 • Spatial Understanding

Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes

Historical Significance: CNNs revolutionized computer vision by exploiting spatial structure What It Showcases:

Spatial feature extraction
Hierarchical pattern recognition
Translation invariance
Natural image classification

Systems Insights:

Memory: ~1M parameters, but shared weights reduce memory vs dense layers
Compute: Convolution is compute-intensive but highly parallelizable
Architecture: Local connectivity + weight sharing = spatial intelligence

Run After: Module 10 (DataLoader) + Module 09 (Spatial) ✅

🤖 TinyGPT 2018 - `gpt_2018/`

After Modules 2-14 • Language Understanding

Tokens → Embeddings → Attention → FFN → ... → Attention → Output

Historical Significance: Transformers + attention revolutionized NLP and launched the LLM era What It Showcases:

Sequence modeling
Attention mechanisms
Autoregressive generation
Foundation for ChatGPT/GPT-4

Systems Insights:

Memory: O(n²) attention requires careful memory management
Compute: Attention is compute-intensive but highly parallelizable
Architecture: Self-attention enables long-range dependencies

Run After: Module 14 (Transformers) ✅

🎯 Learning Progression Design

Capability Building Sequence

Stage	Capability Unlocked	Architectural Innovation	Real-World Impact
Stage 1	Binary classification	Single-layer networks	Basic pattern recognition
Stage 2	Non-linear problems	Hidden layers + activation	Complex decision boundaries
Stage 3	Multi-class vision	Deep feedforward networks	Handwritten digit recognition
Stage 4	Spatial understanding	Convolutional networks	Natural image classification
Stage 5	Sequence modeling	Attention mechanisms	Language understanding

Systems Engineering Progression

Memory Management: From O(n) → O(n²) → O(n²) with optimizations
Computational Complexity: Understanding trade-offs between accuracy and efficiency
Architectural Patterns: How structure enables capability
Production Deployment: What it takes to scale these in practice

🔧 Systems Analysis in Each Example

Each milestone includes:

Memory Profiling

import tracemalloc
tracemalloc.start()
# ... run model ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")

Performance Measurement

# Parameter counting
total_params = sum(p.data.size for p in model.parameters())
print(f"Parameters: {total_params:,}")

# FLOP estimation  
flops = estimate_flops(model, input_shape)
print(f"FLOPs per forward pass: {flops:,}")

Scaling Analysis

# Show how performance scales with model size
for hidden_size in [64, 128, 256, 512]:
    model = create_model(hidden_size)
    time_per_epoch = benchmark_training(model)
    print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")

📂 File Structure

examples/
├── README.md                    # This file - milestone overview
├── perceptron_1957/
│   └── rosenblatt_perceptron.py # First trainable neural network
├── xor_1969/
│   └── minsky_xor_problem.py    # Non-linear problem solving
├── mnist_mlp_1986/
│   └── train_mlp.py             # Real vision with multi-layer networks
├── cifar_cnn_modern/
│   ├── train_cnn.py             # Spatial feature extraction with CNNs
│   └── data/                    # CIFAR-10 dataset
├── gpt_2018/
│   └── train_gpt.py             # Language modeling with transformers
└── pretrained/
    ├── mnist_mlp_weights.npz    # Pre-trained weights for quick demos
    ├── cifar10_cnn_weights.npz
    └── xor_weights.npz

🚀 How to Run These Examples

Prerequisites Check

# 1. Verify your TinyTorch installation
tito system doctor

# 2. Check which modules you've completed  
tito checkpoint status

# 3. Ensure you're in the project root
cd /path/to/TinyTorch

Dataset Management (Automatic)

Don't worry about data logistics! Each example automatically handles dataset downloading:

MNIST: Downloads from official LeCun server (~60MB)
CIFAR-10: Downloads from University of Toronto (~170MB)
XOR/Perceptron: Generates synthetic data instantly

First run will download data, subsequent runs use cached data.

Running Examples by Module Completion

📱 Quick Test (No Training)

Test architecture and imports without waiting for downloads:

# Test what you've built so far
python examples/perceptron_1957/rosenblatt_perceptron.py --test-only
python examples/xor_1969/minsky_xor_problem.py --test-only

🎯 Full Milestone Demonstrations

# After Module 04 - Foundation (30 seconds)
python examples/perceptron_1957/rosenblatt_perceptron.py
# Demonstrates: YOU built Linear layers + activation functions

# After Module 06 - Autograd (1 minute)  
python examples/xor_1969/minsky_xor_problem.py
# Demonstrates: YOU built gradient computation + training loops

# After Module 08 - Training (2-3 minutes + MNIST download)
python examples/mnist_mlp_1986/train_mlp.py
# Demonstrates: YOU built complete vision pipeline

# After Module 10 - DataLoader + Spatial (3-5 minutes + CIFAR download)
python examples/cifar_cnn_modern/train_cnn.py  
# Demonstrates: YOU built convolutional networks

# After Module 14 - Transformers (5-10 minutes)
python examples/gpt_2018/train_gpt.py
# Demonstrates: YOU built attention mechanisms + language models

🚫 Troubleshooting Common Issues

Import Errors

# If you see "ModuleNotFoundError: No module named 'tinytorch'"
cd /path/to/TinyTorch
python -m pip install -e .

# Or run with explicit path
PYTHONPATH=/path/to/TinyTorch python examples/perceptron_1957/rosenblatt_perceptron.py

Dataset Download Issues

# Manual dataset download if automatic fails
python examples/data_manager.py  # Test all datasets

# Or download specific datasets
python -c "from examples.data_manager import DatasetManager; DatasetManager().get_mnist()"

Memory Issues

# Reduce batch size for limited memory
python examples/cifar_cnn_modern/train_cnn.py --batch-size 16

# Use test mode for architecture validation only
python examples/mnist_mlp_1986/train_mlp.py --test-only

Slow Training

# Quick demo mode (reduced epochs)
python examples/mnist_mlp_1986/train_mlp.py --demo-mode

# Use pre-trained weights for instant results
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained

📊 Expected Performance & Timing

Example	Dataset Size	Download Time	Training Time	Expected Accuracy
Perceptron 1957	1K synthetic	0s	30s	95%+ (linearly separable)
XOR 1969	1K synthetic	0s	1min	90%+ (non-linear)
MNIST MLP 1986	60K images	2-5min	2-3min	85%+ (real vision)
CIFAR CNN Modern	50K images	5-10min	3-5min	65%+ (natural images)
TinyGPT 2018	Text corpus	1-2min	5-10min	Coherent generation

Note: First run includes dataset download time. Subsequent runs are much faster.

🤔 ML Systems Thinking Questions

After Each Milestone, Consider:

Memory Implications:
- How much memory does this architecture require?
- What happens when you scale to larger inputs/models?
Computational Complexity:
- Where are the computational bottlenecks?
- How does training time scale with model size?
Production Deployment:
- How would you serve this model to millions of users?
- What optimizations would you apply for real-time inference?
Historical Context:
- Why was this innovation important for the field?
- How does this relate to modern architectures (ResNet, BERT, GPT)?
Engineering Trade-offs:
- What are the memory vs accuracy trade-offs?
- When would you choose this architecture over alternatives?

🎓 Educational Outcomes

By completing all milestone examples, students will:

Technical Mastery

✅ Understand the evolution of neural network architectures
✅ Build complete ML systems from scratch using their own implementations
✅ Analyze memory and computational trade-offs in different architectures
✅ Connect historical innovations to modern production systems

Systems Engineering Mindset

✅ Think about scalability and production deployment from day one
✅ Understand the engineering decisions that shaped modern ML frameworks
✅ Develop intuition for when to use different architectural patterns
✅ Build confidence in ML systems engineering roles

Real-World Preparation

✅ Experience working with the same patterns used in PyTorch/TensorFlow
✅ Understand the systems thinking behind modern ML engineering
✅ Develop portfolio projects demonstrating deep technical understanding
✅ Build foundation for advanced ML systems engineering roles

Remember: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.

🚀 Start your journey: python examples/perceptron_1957/rosenblatt_perceptron.py

README.md

🏆 TinyTorch Milestone Examples

🎯 Milestone Philosophy

Why These Specific Examples?

What Makes This Educational?

📅 Historical Timeline & Module Mapping

🧠 Perceptron 1957 - perceptron_1957/

⚡ XOR Problem 1969 - xor_1969/

🔢 MNIST MLP 1986 - mnist_mlp_1986/

🖼️ CIFAR CNN Modern - cifar_cnn_modern/

🤖 TinyGPT 2018 - gpt_2018/

🎯 Learning Progression Design

Capability Building Sequence

Systems Engineering Progression

🔧 Systems Analysis in Each Example

Memory Profiling

Performance Measurement

Scaling Analysis

📂 File Structure

🚀 How to Run These Examples

Prerequisites Check

Dataset Management (Automatic)

Running Examples by Module Completion

📱 Quick Test (No Training)

🎯 Full Milestone Demonstrations

🚫 Troubleshooting Common Issues

Import Errors

Dataset Download Issues

Memory Issues

Slow Training

📊 Expected Performance & Timing

🤔 ML Systems Thinking Questions

After Each Milestone, Consider:

🎓 Educational Outcomes

Technical Mastery

Systems Engineering Mindset

Real-World Preparation

🧠 Perceptron 1957 - `perceptron_1957/`

⚡ XOR Problem 1969 - `xor_1969/`

🔢 MNIST MLP 1986 - `mnist_mlp_1986/`

🖼️ CIFAR CNN Modern - `cifar_cnn_modern/`

🤖 TinyGPT 2018 - `gpt_2018/`