mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-06-03 09:50:52 -05:00

Files

Vijay Janapa Reddi 0d2d569002 MILESTONES: Fix misleading naming and add comprehensive milestone structure

Educational improvements to milestone examples:

NAMING FIXES (historically accurate):
- Rename lenet_1998 → mnist_mlp_1986 (LeNet was CNN, not MLP)
- Rename alexnet_2012 → cifar_cnn_modern (not actual AlexNet architecture)
- Update all Dense → Linear for PyTorch consistency

COMPREHENSIVE MILESTONE STRUCTURE:
- Add detailed examples/README.md explaining historical progression
- Map each milestone to specific module completion points:
  * Perceptron 1957: After Modules 2-4 (Foundation)
  * XOR 1969: After Modules 2-6 (Non-linear problems)
  * MNIST MLP 1986: After Modules 2-8 (Real vision)
  * CIFAR CNN Modern: After Modules 2-10 (Spatial understanding)
  * TinyGPT 2018: After Modules 2-14 (Language modeling)

EDUCATIONAL VALUE:
- Clear capability progression from basic to advanced
- Systems analysis focus (memory, performance, scaling)
- Production context connections to real PyTorch patterns
- Historical significance explanations for each innovation

All examples validated and working with current TinyTorch implementation.
Students now have clear "proof of mastery" demonstrations at each stage.

2025-09-26 12:08:31 -04:00

cifar_cnn_modern

MILESTONES: Fix misleading naming and add comprehensive milestone structure

2025-09-26 12:08:31 -04:00

gpt_2018

MILESTONES: Fix misleading naming and add comprehensive milestone structure

2025-09-26 12:08:31 -04:00

mnist_mlp_1986

MILESTONES: Fix misleading naming and add comprehensive milestone structure

2025-09-26 12:08:31 -04:00

perceptron_1957

MILESTONES: Fix misleading naming and add comprehensive milestone structure

2025-09-26 12:08:31 -04:00

pretrained

MILESTONE: Complete Phase 2 CNN training pipeline

2025-09-23 18:33:56 -04:00

xor_1969

MILESTONES: Fix misleading naming and add comprehensive milestone structure

2025-09-26 12:08:31 -04:00

optimization_pipeline_complete.py

FOUNDATION: Establish AI Engineering as a discipline through TinyTorch

2025-09-25 11:16:28 -04:00

profile_and_optimize_demo.py

FOUNDATION: Establish AI Engineering as a discipline through TinyTorch

2025-09-25 11:16:28 -04:00

quantize_and_compress_demo.py

FOUNDATION: Establish AI Engineering as a discipline through TinyTorch

2025-09-25 11:16:28 -04:00

README.md

MILESTONES: Fix misleading naming and add comprehensive milestone structure

2025-09-26 12:08:31 -04:00

README.md

🏆 TinyTorch Milestone Examples

Proof-of-mastery demonstrations showcasing what students can build after completing modules.

These examples demonstrate the evolutionary progression of neural networks from 1957 to 2018, showing how each innovation built upon previous foundations. Students experience the same journey that created modern AI.

🎯 Milestone Philosophy

Why These Specific Examples?

Historical Progression: Experience the actual evolution of neural networks
Capability Showcasing: Demonstrate specific breakthroughs at each stage
Systems Thinking: Understand WHY each innovation mattered for ML systems
Motivation: See real-world impact of concepts you're learning
Integration: Prove mastery by combining multiple modules into working systems

What Makes This Educational?

Not Just Algorithms: Focus on systems engineering and architectural insights
Progressive Complexity: Each milestone builds capabilities from previous ones
Real Implementations: Use actual TinyTorch modules students built
Historical Context: Understand the engineering decisions that shaped modern ML
Production Relevance: Connect to how these patterns appear in PyTorch/TensorFlow

📅 Historical Timeline & Module Mapping

🧠 Perceptron 1957 - `perceptron_1957/`

After Modules 2-4 • Foundation Building

Input → Linear → Sigmoid → Binary Output

Historical Significance: Frank Rosenblatt's perceptron launched the first AI wave What It Showcases:

First trainable neural network
Linear classification boundaries
Gradient-based learning foundation
Why single layers have limitations

Systems Insights:

Memory: O(n) parameters, minimal storage
Compute: O(n) operations per forward pass
Limitations: Only linearly separable problems

Run After: Module 04 (Layers) ✅

⚡ XOR Problem 1969 - `xor_1969/`

After Modules 2-6 • Breaking Limitations

Input → Linear → ReLU → Linear → Output

Historical Significance: Minsky & Papert showed perceptron limitations; multi-layer networks solved them What It Showcases:

Non-linear problem solving
Hidden layer representations
Why depth enables complexity
Foundation for deep learning

Systems Insights:

Memory: O(n²) parameters with hidden layers
Compute: O(n²) operations, but enables non-linear solutions
Architecture: Hidden representations crucial for complex patterns

Run After: Module 06 (Autograd) ✅

🔢 MNIST MLP 1986 - `mnist_mlp_1986/`

After Modules 2-8 • Real Vision Problems

Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes

Historical Significance: Backpropagation enabled training deep networks on real datasets What It Showcases:

Multi-class classification
Real vision datasets
Multi-layer feature learning
Complete training pipelines

Systems Insights:

Memory: ~100K parameters for MNIST (manageable)
Compute: Dense matrix operations, vectorization critical
Scaling: 95%+ accuracy demonstrates effectiveness

Run After: Module 08 (Training) ✅

🖼️ CIFAR CNN Modern - `cifar_cnn_modern/`

After Modules 2-10 • Spatial Understanding

Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes

Historical Significance: CNNs revolutionized computer vision by exploiting spatial structure What It Showcases:

Spatial feature extraction
Hierarchical pattern recognition
Translation invariance
Natural image classification

Systems Insights:

Memory: ~1M parameters, but shared weights reduce memory vs dense layers
Compute: Convolution is compute-intensive but highly parallelizable
Architecture: Local connectivity + weight sharing = spatial intelligence

Run After: Module 10 (DataLoader) + Module 09 (Spatial) ✅

🤖 TinyGPT 2018 - `gpt_2018/`

After Modules 2-14 • Language Understanding

Tokens → Embeddings → Attention → FFN → ... → Attention → Output

Historical Significance: Transformers + attention revolutionized NLP and launched the LLM era What It Showcases:

Sequence modeling
Attention mechanisms
Autoregressive generation
Foundation for ChatGPT/GPT-4

Systems Insights:

Memory: O(n²) attention requires careful memory management
Compute: Attention is compute-intensive but highly parallelizable
Architecture: Self-attention enables long-range dependencies

Run After: Module 14 (Transformers) ✅

🎯 Learning Progression Design

Capability Building Sequence

Stage	Capability Unlocked	Architectural Innovation	Real-World Impact
Stage 1	Binary classification	Single-layer networks	Basic pattern recognition
Stage 2	Non-linear problems	Hidden layers + activation	Complex decision boundaries
Stage 3	Multi-class vision	Deep feedforward networks	Handwritten digit recognition
Stage 4	Spatial understanding	Convolutional networks	Natural image classification
Stage 5	Sequence modeling	Attention mechanisms	Language understanding

Systems Engineering Progression

Memory Management: From O(n) → O(n²) → O(n²) with optimizations
Computational Complexity: Understanding trade-offs between accuracy and efficiency
Architectural Patterns: How structure enables capability
Production Deployment: What it takes to scale these in practice

🔧 Systems Analysis in Each Example

Each milestone includes:

Memory Profiling

import tracemalloc
tracemalloc.start()
# ... run model ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")

Performance Measurement

# Parameter counting
total_params = sum(p.data.size for p in model.parameters())
print(f"Parameters: {total_params:,}")

# FLOP estimation  
flops = estimate_flops(model, input_shape)
print(f"FLOPs per forward pass: {flops:,}")

Scaling Analysis

# Show how performance scales with model size
for hidden_size in [64, 128, 256, 512]:
    model = create_model(hidden_size)
    time_per_epoch = benchmark_training(model)
    print(f"Hidden={hidden_size}: {time_per_epoch:.2f}s/epoch")

📂 File Structure

examples/
├── README.md                    # This file - milestone overview
├── perceptron_1957/
│   └── rosenblatt_perceptron.py # First trainable neural network
├── xor_1969/
│   └── minsky_xor_problem.py    # Non-linear problem solving
├── mnist_mlp_1986/
│   └── train_mlp.py             # Real vision with multi-layer networks
├── cifar_cnn_modern/
│   ├── train_cnn.py             # Spatial feature extraction with CNNs
│   └── data/                    # CIFAR-10 dataset
├── gpt_2018/
│   └── train_gpt.py             # Language modeling with transformers
└── pretrained/
    ├── mnist_mlp_weights.npz    # Pre-trained weights for quick demos
    ├── cifar10_cnn_weights.npz
    └── xor_weights.npz

🚀 Running the Examples

Prerequisites Check

# Verify your TinyTorch installation
tito system doctor

# Check which modules you've completed
tito checkpoint status

Run Examples by Module Completion

# After Module 04 - Basic networks
python examples/perceptron_1957/rosenblatt_perceptron.py

# After Module 06 - Autograd  
python examples/xor_1969/minsky_xor_problem.py

# After Module 08 - Training
python examples/mnist_mlp_1986/train_mlp.py

# After Module 10 - DataLoader + Spatial
python examples/cifar_cnn_modern/train_cnn.py

# After Module 14 - Transformers
python examples/gpt_2018/train_gpt.py

Quick Demo with Pre-trained Weights

# Use pre-trained weights for instant results
python examples/mnist_mlp_1986/train_mlp.py --use-pretrained
python examples/cifar_cnn_modern/train_cnn.py --use-pretrained

🤔 ML Systems Thinking Questions

After Each Milestone, Consider:

Memory Implications:
- How much memory does this architecture require?
- What happens when you scale to larger inputs/models?
Computational Complexity:
- Where are the computational bottlenecks?
- How does training time scale with model size?
Production Deployment:
- How would you serve this model to millions of users?
- What optimizations would you apply for real-time inference?
Historical Context:
- Why was this innovation important for the field?
- How does this relate to modern architectures (ResNet, BERT, GPT)?
Engineering Trade-offs:
- What are the memory vs accuracy trade-offs?
- When would you choose this architecture over alternatives?

🎓 Educational Outcomes

By completing all milestone examples, students will:

Technical Mastery

✅ Understand the evolution of neural network architectures
✅ Build complete ML systems from scratch using their own implementations
✅ Analyze memory and computational trade-offs in different architectures
✅ Connect historical innovations to modern production systems

Systems Engineering Mindset

✅ Think about scalability and production deployment from day one
✅ Understand the engineering decisions that shaped modern ML frameworks
✅ Develop intuition for when to use different architectural patterns
✅ Build confidence in ML systems engineering roles

Real-World Preparation

✅ Experience working with the same patterns used in PyTorch/TensorFlow
✅ Understand the systems thinking behind modern ML engineering
✅ Develop portfolio projects demonstrating deep technical understanding
✅ Build foundation for advanced ML systems engineering roles

Remember: These aren't just coding exercises - they're journeys through the history of AI that prepare you for the future of ML systems engineering.

🚀 Start your journey: python examples/perceptron_1957/rosenblatt_perceptron.py

README.md

🏆 TinyTorch Milestone Examples

🎯 Milestone Philosophy

Why These Specific Examples?

What Makes This Educational?

📅 Historical Timeline & Module Mapping

🧠 Perceptron 1957 - perceptron_1957/

⚡ XOR Problem 1969 - xor_1969/

🔢 MNIST MLP 1986 - mnist_mlp_1986/

🖼️ CIFAR CNN Modern - cifar_cnn_modern/

🤖 TinyGPT 2018 - gpt_2018/

🎯 Learning Progression Design

Capability Building Sequence

Systems Engineering Progression

🔧 Systems Analysis in Each Example

Memory Profiling

Performance Measurement

Scaling Analysis

📂 File Structure

🚀 Running the Examples

Prerequisites Check

Run Examples by Module Completion

Quick Demo with Pre-trained Weights

🤔 ML Systems Thinking Questions

After Each Milestone, Consider:

🎓 Educational Outcomes

Technical Mastery

Systems Engineering Mindset

Real-World Preparation

🧠 Perceptron 1957 - `perceptron_1957/`

⚡ XOR Problem 1969 - `xor_1969/`

🔢 MNIST MLP 1986 - `mnist_mlp_1986/`

🖼️ CIFAR CNN Modern - `cifar_cnn_modern/`

🤖 TinyGPT 2018 - `gpt_2018/`