diff --git a/book/chapters/00-introduction.md b/book/chapters/00-introduction.md index a5da6958..58e6f312 100644 --- a/book/chapters/00-introduction.md +++ b/book/chapters/00-introduction.md @@ -141,95 +141,119 @@ output = model(input) # YOU know exactly how this works --- -## What You'll Achieve: Complete ML Systems Mastery +## What You'll Achieve: Tier-by-Tier Mastery -### Immediate Achievements (Modules 1-8) -By Module 8, you'll have built a complete neural network framework from scratch: +### šļø After Foundation Tier (Modules 01-07) +Build a complete neural network framework from mathematical first principles: ```python # YOUR implementation training real networks on real data model = Sequential([ - Linear(784, 128), # Your linear layer - ReLU(), # Your activation function - Linear(128, 64), # Your architecture design + Linear(784, 128), # Your linear algebra implementation + ReLU(), # Your activation function + Linear(128, 64), # Your gradient-aware layers ReLU(), # Your nonlinearity - Linear(64, 10) # Your final classifier + Linear(64, 10) # Your classification head ]) -# YOUR training loop using YOUR optimizer -optimizer = Adam(model.parameters(), lr=0.001) # Your Adam implementation -for batch in dataloader: # Your data loading - output = model(batch.x) # Your forward pass - loss = CrossEntropyLoss()(output, batch.y) # Your loss function - loss.backward() # Your backpropagation +# YOUR complete training system +optimizer = Adam(model.parameters(), lr=0.001) # Your optimization algorithm +for batch in dataloader: # Your data management + output = model(batch.x) # Your forward computation + loss = CrossEntropyLoss()(output, batch.y) # Your loss calculation + loss.backward() # YOUR backpropagation engine optimizer.step() # Your parameter updates ``` -**Result: 95%+ accuracy on MNIST using 100% your own code.** +**šÆ Foundation Achievement**: 95%+ accuracy on MNIST using 100% your own mathematical implementations -### Advanced Capabilities (Modules 9-14) -- **Computer Vision**: CNNs achieving 75%+ accuracy on CIFAR-10 -- **Language Models**: TinyGPT built using 95% of your vision components -- **Universal Architecture**: Same mathematical foundations power all modern AI +### š§ After Intelligence Tier (Modules 08-13) +- **Computer Vision Mastery**: CNNs achieving 75%+ accuracy on CIFAR-10 with YOUR convolution implementations +- **Language Understanding**: Transformers generating coherent text using YOUR attention mechanisms +- **Universal Architecture**: Discover why the SAME mathematical principles work for vision AND language +- **AI Breakthrough Recreation**: Implement the architectures that created the modern AI revolution -### Production Systems (Modules 15-20) -- **Performance Engineering**: Profile, measure, and optimize ML systems -- **Memory Optimization**: Understand and implement compression techniques -- **Hardware Acceleration**: Build efficient kernels and vectorized operations -- **TinyMLPerf Competition**: Compete with optimized implementations +### ā” After Optimization Tier (Modules 14-20) +- **Production Performance**: Systems optimized for <100ms inference latency using YOUR profiling tools +- **Memory Efficiency**: Models compressed to 25% original size with YOUR quantization implementations +- **Hardware Acceleration**: Kernels achieving 10x speedups through YOUR vectorization techniques +- **Competition Ready**: TinyMLPerf submissions competitive with industry implementations --- ## The ML Evolution Story You'll Experience -TinyTorch follows the actual historical progression of machine learning breakthroughs: +TinyTorch's three-tier structure follows the actual historical progression of machine learning breakthroughs: -### š§ Era 1: Foundation (1980s) - Modules 1-8 -**The Beginning**: Perceptrons and multi-layer networks -- Build tensor operations and automatic differentiation -- Implement gradient-based optimization (SGD, Adam) -- **Achievement**: Train MLPs to 95%+ accuracy on MNIST +### šļø Foundation Era (1980s-1990s) ā Foundation Tier +**The Beginning**: Mathematical foundations that started it all +- **1986 Breakthrough**: Backpropagation enables multi-layer networks +- **Your Implementation**: Build automatic differentiation and gradient-based optimization +- **Historical Milestone**: Train MLPs to 95%+ accuracy on MNIST using YOUR autograd engine -### šļø Era 2: Spatial Intelligence (1989-2012) - Modules 9-10 -**The Revolution**: Convolutional neural networks -- Add spatial processing with Conv2d and pooling operations -- Build efficient data pipelines for real-world datasets -- **Achievement**: Train CNNs to 75%+ accuracy on CIFAR-10 +### š§ Intelligence Era (1990s-2010s) ā Intelligence Tier +**The Revolution**: Specialized architectures for vision and language +- **1998 Breakthrough**: CNNs revolutionize computer vision (LeCun's LeNet) +- **2017 Breakthrough**: Transformers unify vision and language ("Attention is All You Need") +- **Your Implementation**: Build CNNs achieving 75%+ on CIFAR-10, then transformers for text generation +- **Historical Milestone**: Recreate both revolutions using YOUR spatial and attention implementations -### š£ļø Era 3: Universal Architecture (2017-Present) - Modules 11-14 -**The Unification**: Transformers for vision AND language -- Implement attention mechanisms and positional embeddings -- Build TinyGPT using your existing vision infrastructure -- **Achievement**: Language generation with 95% component reuse +### ā” Optimization Era (2010s-Present) ā Optimization Tier +**The Engineering**: Production systems that scale to billions of users +- **2020s Breakthrough**: Efficient inference enables real-time LLMs (GPT, ChatGPT) +- **Your Implementation**: Build KV-caching, quantization, and production optimizations +- **Historical Milestone**: Deploy systems competitive in TinyMLPerf benchmarks -### ā” Era 4: Production Systems (Present) - Modules 15-20 -**The Engineering**: Optimized, deployable ML systems -- Profile performance and identify bottlenecks -- Implement compression, quantization, and acceleration -- **Achievement**: TinyMLPerf competition-ready implementations +**Why This Progression Matters**: You'll understand not just modern AI, but WHY it evolved this way. Each tier builds essential capabilities that inform the next, just like ML history itself. --- -## Systems Engineering Focus: Why It Matters +## Systems Engineering Focus: Why Tiers Matter -Traditional ML courses focus on **algorithms**. TinyTorch focuses on **systems**. +Traditional ML courses teach algorithms in isolation. TinyTorch's tier structure teaches **systems thinking** - how components interact to create production ML systems. -### What Traditional Courses Teach: -- "Use `torch.optim.Adam` for optimization" -- "Transformers use attention mechanisms" -- "Larger models generally perform better" +### Traditional Linear Approach: +``` +Module 1: Tensors ā Module 2: Layers ā Module 3: Training ā ... +``` +**Problem**: Students learn components but miss system interactions -### What TinyTorch Teaches: -- "Why Adam consumes 3Ć more memory than SGD and when that matters in production" -- "How attention scales O(N²) with sequence length and limits context windows" -- "How to profile memory usage and identify training bottlenecks" +### TinyTorch Tier Approach: +``` +šļø Foundation Tier: Build mathematical infrastructure +š§ Intelligence Tier: Compose intelligent architectures +ā” Optimization Tier: Deploy at production scale +``` +**Advantage**: Each tier builds complete, working systems with clear progression -### Career Impact -After TinyTorch, you become the team member who: -- **Debugs performance issues**: "Your convolution is memory-bound, not compute-bound" -- **Optimizes production systems**: "We can use gradient accumulation to train with less GPU memory" -- **Implements custom operations**: "I'll write a custom kernel for this novel architecture" -- **Designs system architecture**: "Here's why this model won't scale and how to fix it" +### What Traditional Courses Teach vs. TinyTorch Tiers: + +**Traditional**: "Use `torch.optim.Adam` for optimization" +**Foundation Tier**: "Why Adam needs 3Ć more memory than SGD and how to implement both from mathematical first principles" + +**Traditional**: "Transformers use attention mechanisms" +**Intelligence Tier**: "How attention creates O(N²) scaling, why this limits context windows, and how to implement efficient attention yourself" + +**Traditional**: "Deploy models with TensorFlow Serving" +**Optimization Tier**: "How to profile bottlenecks, implement KV-caching for 10Ć speedup, and compete in production benchmarks" + +### Career Impact by Tier +After each tier, you become the team member who: + +**šļø Foundation Tier Graduate**: +- Debugs gradient flow issues: "Your ReLU is causing dead neurons" +- Implements custom optimizers: "I'll build a variant of Adam for this use case" +- Understands memory patterns: "Batch size 64 hits your GPU memory limit here" + +**š§ Intelligence Tier Graduate**: +- Designs novel architectures: "We can adapt transformers for this computer vision task" +- Optimizes attention patterns: "This attention bottleneck is why your model won't scale to longer sequences" +- Bridges vision and language: "The same mathematical principles work for both domains" + +**ā” Optimization Tier Graduate**: +- Deploys production systems: "I can get us from 500ms to 50ms inference latency" +- Leads performance optimization: "Here's our memory bottleneck and my 3-step plan to fix it" +- Competes at industry scale: "Our optimizations achieve TinyMLPerf benchmark performance" --- @@ -254,165 +278,159 @@ After TinyTorch, you become the team member who: --- -## Ready to Begin? +## š Start Your Journey -You're about to embark on a journey that will transform how you think about machine learning systems. Instead of using black-box frameworks, you'll understand every component from the ground up. +
Choose your starting point based on your goals and time commitment
+15-Minute Start ā +Foundation Tier ā +Implement tensor operations, understand memory layout, build arithmetic foundations. Core mathematical building blocks.
Linear transformations, activation functions, loss functions. Build the mathematical components of neural computation.
+| Module | Component | Core Capability | Real-World Connection | +|--------|-----------|-----------------|----------------------| +| **01** | **Tensor** | Data structures and operations | NumPy, PyTorch tensors | +| **02** | **Activations** | Nonlinear functions | ReLU, attention activations | +| **03** | **Layers** | Linear transformations | `nn.Linear`, dense layers | +| **04** | **Losses** | Optimization objectives | CrossEntropy, MSE loss | +| **05** | **Autograd** | Automatic differentiation | PyTorch autograd engine | +| **06** | **Optimizers** | Parameter updates | Adam, SGD optimizers | +| **07** | **Training** | Complete training loops | Model.fit(), training scripts | + +**šÆ Tier Milestone**: Train neural networks achieving **95%+ accuracy on MNIST** using 100% your own implementations! + +**Skills Gained**: +- Understand memory layout and computational graphs +- Debug gradient flow and numerical stability issues +- Implement any optimization algorithm from research papers +- Build custom neural network architectures from scratch + +--- + +### š§ INTELLIGENCE TIER (Modules 08-13) +**Modern AI Algorithms ⢠4-6 weeks ⢠Vision + Language Architectures** + +Automatic differentiation, optimization algorithms, training procedures. Understand how neural networks learn.
+| Module | Component | Core Capability | Real-World Connection | +|--------|-----------|-----------------|----------------------| +| **08** | **Spatial** | Convolutions and regularization | CNNs, ResNet, computer vision | +| **09** | **DataLoader** | Batch processing | PyTorch DataLoader, tf.data | +| **10** | **Tokenization** | Text preprocessing | BERT tokenizer, GPT tokenizer | +| **11** | **Embeddings** | Representation learning | Word2Vec, positional encodings | +| **12** | **Attention** | Information routing | Multi-head attention, self-attention | +| **13** | **Transformers** | Modern architectures | GPT, BERT, Vision Transformer | + +**šÆ Tier Milestone**: Achieve **75%+ accuracy on CIFAR-10** with CNNs AND generate coherent text with transformers! + +**Skills Gained**: +- Understand why convolution works for spatial data +- Implement attention mechanisms from scratch +- Build transformer architectures for any domain +- Debug sequence modeling and attention patterns + +--- + +### ā” OPTIMIZATION TIER (Modules 14-20) +**Production & Performance ⢠4-6 weeks ⢠Deploy and Scale ML Systems** + +Performance analysis, computational kernels, benchmarking. Study the engineering principles behind ML systems.
+| Module | Component | Core Capability | Real-World Connection | +|--------|-----------|-----------------|----------------------| +| **14** | **Profiling** | Performance analysis | PyTorch Profiler, TensorBoard | +| **15** | **Acceleration** | Speed improvements | CUDA kernels, vectorization | +| **16** | **Quantization** | Memory efficiency | INT8 inference, model compression | +| **17** | **Compression** | Model optimization | Pruning, distillation, ONNX | +| **18** | **Caching** | Memory management | KV-cache for generation | +| **19** | **Benchmarking** | Measurement systems | MLPerf, production monitoring | +| **20** | **Capstone** | Full system integration | End-to-end ML pipeline | + +**šÆ Tier Milestone**: Build **production-ready systems** competitive in TinyMLPerf benchmarks! + +**Skills Gained**: +- Profile memory usage and identify bottlenecks +- Implement efficient inference optimizations +- Deploy models with <100ms latency requirements +- Design scalable ML system architectures + +--- + +## šÆ Learning Path Recommendations + +### Choose Your Learning Style + +Implement every component from scratch
+Time: 14-18 weeks
Ideal for: CS students, aspiring ML engineers
Pick one tier based on your goals
+Time: 4-8 weeks
Ideal for: Working professionals, specific skill gaps
Study implementations with hands-on exercises
+Time: 8-12 weeks
Ideal for: Self-directed learners, bootcamp graduates