All module references updated to reflect new ordering: - Module 15: Quantization (was 16) - Module 16: Compression (was 17) - Module 17: Memoization (was 15) Updated by module-developer and website-manager agents: - Module ABOUT files with correct numbers and prerequisites - Cross-references and "What's Next" chains - Website navigation (_toc.yml) and content - Learning path progression in LEARNING_PATH.md - Profile milestone completion message (Module 17) Pedagogical flow now: Profile → Quantize → Prune → Cache → Accelerate
8.2 KiB
🏆 Journey Through ML History
Experience the evolution of AI by rebuilding history's most important breakthroughs with YOUR TinyTorch implementations!
🎯 What Are Milestones?
Milestones are proof-of-mastery demonstrations that showcase what you can build after completing specific modules. Each milestone recreates a historically significant ML achievement using YOUR implementations.
Why This Approach?
- 🧠 Deep Understanding: Experience the actual challenges researchers faced
- 📈 Progressive Learning: Each milestone builds on previous foundations
- 🏆 Real Achievements: Not toy examples - these are historically significant breakthroughs
- 🔧 Systems Thinking: Understand WHY each innovation mattered for ML systems
📅 The Timeline
🧠 01. Perceptron (1957) - Rosenblatt
After Modules 02-04
Input → Linear → Sigmoid → Output
The Beginning: The first trainable neural network! Frank Rosenblatt proved machines could learn from data.
What You'll Build:
- Binary classification with gradient descent
- Simple but revolutionary architecture
- YOUR Linear layer recreates history
Systems Insights:
- Memory: O(n) parameters
- Compute: O(n) operations
- Limitation: Only linearly separable problems
cd milestones/01_1957_perceptron
python perceptron_trained.py
Expected Results: 95%+ accuracy on linearly separable data
⚡ 02. XOR Crisis (1969) - Minsky & Papert
After Modules 02-06
Input → Linear → ReLU → Linear → Output
The Challenge: Minsky proved perceptrons couldn't solve XOR. This crisis nearly ended AI research!
What You'll Build:
- Hidden layers enable non-linear solutions
- Multi-layer networks break through limitations
- YOUR autograd makes it possible
Systems Insights:
- Memory: O(n²) with hidden layers
- Compute: O(n²) operations
- Breakthrough: Hidden representations
cd milestones/02_1969_xor_crisis
python xor_solved.py
Expected Results: 90%+ accuracy solving XOR
🔢 03. MLP Revival (1986) - Backpropagation Era
After Modules 02-08
Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes
The Revolution: Backpropagation enabled training deep networks on real datasets like MNIST.
What You'll Build:
- Multi-class digit recognition
- Complete training pipelines
- YOUR optimizers achieve 95%+ accuracy
Systems Insights:
- Memory: ~100K parameters for MNIST
- Compute: Dense matrix operations
- Architecture: Multi-layer feature learning
cd milestones/03_1986_mlp_revival
python mlp_digits.py # 8x8 digits (quick)
python mlp_mnist.py # Full MNIST
Expected Results: 95%+ accuracy on MNIST
🖼️ 04. CNN Revolution (1998) - LeCun's Breakthrough
After Modules 02-09 • 🎯 North Star Achievement
Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes
The Game-Changer: CNNs exploit spatial structure for computer vision. This enabled modern AI!
What You'll Build:
- Convolutional feature extraction
- Natural image classification (CIFAR-10)
- YOUR Conv2d + MaxPool2d unlock spatial intelligence
Systems Insights:
- Memory: ~1M parameters (weight sharing reduces vs dense)
- Compute: Convolution is intensive but parallelizable
- Architecture: Local connectivity + translation invariance
cd milestones/04_1998_cnn_revolution
python cnn_digits.py # Spatial features on digits
python lecun_cifar10.py # CIFAR-10 @ 75%+ accuracy
Expected Results: 75%+ accuracy on CIFAR-10 ✨
🤖 05. Transformer Era (2017) - Attention Revolution
After Modules 02-13
Tokens → Embeddings → Attention → FFN → ... → Attention → Output
The Modern Era: Transformers + attention launched the LLM revolution (GPT, BERT, ChatGPT).
What You'll Build:
- Self-attention mechanisms
- Autoregressive text generation
- YOUR attention implementation generates language
Systems Insights:
- Memory: O(n²) attention requires careful management
- Compute: Highly parallelizable
- Architecture: Long-range dependencies
cd milestones/05_2017_transformer_era
python vaswani_shakespeare.py
Expected Results: Coherent text generation
⚡ 06. Systems Age (2024) - Modern ML Engineering
After Modules 02-19
Profile → Analyze → Optimize → Benchmark → Compete
The Present: Modern ML is systems engineering - profiling, optimization, and production deployment.
What You'll Build:
- Performance profiling tools
- Memory optimization techniques
- Competitive benchmarking
Systems Insights:
- Full ML systems pipeline
- Production optimization patterns
- Real-world engineering trade-offs
cd milestones/06_2024_systems_age
python optimize_models.py
Expected Results: Production-grade optimized models
🎓 Learning Philosophy
Progressive Capability Building
| Stage | Era | Capability | Your Tools |
|---|---|---|---|
| 1957 | Foundation | Binary classification | Linear + Sigmoid |
| 1969 | Depth | Non-linear problems | Hidden layers + Autograd |
| 1986 | Scale | Multi-class vision | Optimizers + Training |
| 1998 | Structure | Spatial understanding | Conv2d + Pooling |
| 2017 | Attention | Sequence modeling | Transformers + Attention |
| 2024 | Systems | Production deployment | Profiling + Optimization |
Systems Engineering Progression
Each milestone teaches critical systems thinking:
- Memory Management: From O(n) → O(n²) → O(n²) with optimizations
- Computational Trade-offs: Accuracy vs efficiency
- Architectural Patterns: How structure enables capability
- Production Deployment: What it takes to scale
🚀 How to Use Milestones
1. Complete Prerequisites
# Check which modules you've completed
tito checkpoint status
# Complete required modules
tito module complete 02_tensor
tito module complete 03_activations
# ... and so on
2. Run the Milestone
cd milestones/01_1957_perceptron
python perceptron_trained.py
3. Understand the Systems
Each milestone includes:
- 📊 Memory profiling: See actual memory usage
- ⚡ Performance metrics: FLOPs, parameters, timing
- 🧠 Architectural analysis: Why this design matters
- 📈 Scaling insights: How performance changes with size
4. Reflect and Compare
Questions to ask:
- How does this compare to modern architectures?
- What were the computational constraints in that era?
- How would you optimize this for production?
- What patterns appear in PyTorch/TensorFlow?
🎯 Quick Reference
Milestone Prerequisites
| Milestone | After Module | Key Requirements |
|---|---|---|
| 01. Perceptron (1957) | 04 | Tensor, Activations, Layers |
| 02. XOR (1969) | 06 | + Losses, Autograd |
| 03. MLP (1986) | 08 | + Optimizers, Training |
| 04. CNN (1998) | 09 | + Spatial, DataLoader |
| 05. Transformer (2017) | 13 | + Tokenization, Embeddings, Attention |
| 06. Systems (2024) | 19 | Full optimization suite |
What Each Milestone Proves
✅ Your implementations work - Not just toy code
✅ Historical significance - These breakthroughs shaped modern AI
✅ Systems understanding - You know memory, compute, scaling
✅ Production relevance - Patterns used in real ML frameworks
📚 Further Learning
After completing milestones, explore:
- TinyMLPerf Competition: Optimize your implementations
- Leaderboard: Compare with other students
- Capstone Projects: Build your own ML applications
- Research Papers: Read the original papers for each milestone
🌟 Why This Matters
Most courses teach you to USE frameworks.
TinyTorch teaches you to UNDERSTAND them.
By rebuilding ML history, you gain:
- 🧠 Deep intuition for how neural networks work
- 🔧 Systems thinking for production ML
- 🏆 Portfolio projects demonstrating mastery
- 💼 Preparation for ML systems engineering roles
Ready to start your journey through ML history?
cd milestones/01_1957_perceptron
python perceptron_trained.py
Build the future by understanding the past. 🚀