# The TinyTorch Vision

**Training ML Systems Engineers: From Computer Vision to Language Models**

---

## The Problem We're Solving

The ML field has a critical gap: **most education teaches you to use frameworks, not build them.**

### Traditional ML Education:
```python
import torch
import torch.nn as nn
model = nn.Linear(784, 10)
optimizer = torch.optim.Adam(model.parameters())
```

**Questions students can't answer:**
- Why does Adam use 3× more memory than SGD?
- How does `loss.backward()` actually compute gradients?
- When should you use gradient accumulation vs larger batch sizes?
- Why do attention mechanisms limit context length?

### The TinyTorch Difference:
```python
class Linear:
    def __init__(self, in_features, out_features):
        self.weight = Tensor(np.random.randn(in_features, out_features))
        self.bias = Tensor(np.zeros(out_features))
    
    def forward(self, x):
        return x @ self.weight + self.bias  # YOU implemented @
    
    def backward(self, grad_output):
        # YOU understand exactly how gradients flow
        self.weight.grad = x.T @ grad_output
        return grad_output @ self.weight.T
```

**Questions students CAN answer:**
- Exactly how automatic differentiation works
- Why certain optimizers use more memory
- How to debug training instability
- When to make performance vs accuracy trade-offs

---

## What We Teach: Systems Thinking

### Beyond Algorithms: System-Level Understanding

**Memory Management:**
- Why Adam needs 3× parameter memory (parameters + momentum + variance)
- How attention matrices scale O(N²) with sequence length
- When gradient accumulation saves memory vs compute trade-offs

**Performance Analysis:**
- Why naive convolution is 100× slower than optimized versions
- How cache misses destroy performance in matrix operations
- When vectorization provides 10-100× speedups

**Production Trade-offs:**
- SGD vs Adam: convergence speed vs memory constraints
- Gradient checkpointing: trading compute for memory
- Mixed precision: 2× memory savings with accuracy considerations

**Hardware Awareness:**
- How memory bandwidth limits ML performance
- Why GPU utilization matters more than peak FLOPS
- When distributed training becomes necessary

---

## Target Audience: Future ML Systems Engineers

### Perfect For:

**Computer Science Students**
- Going beyond "use PyTorch" to "understand PyTorch"
- Building portfolio projects that demonstrate deep system knowledge
- Preparing for ML engineering roles (not just data science)

**Software Engineers → ML Engineers**
- Leveraging existing programming skills for ML systems
- Understanding performance, debugging, and optimization
- Learning production ML patterns and infrastructure

**ML Practitioners**
- Moving from model users to model builders
- Debugging training issues at the systems level  
- Optimizing models for production deployment

**Researchers & Advanced Users**
- Implementing custom operations and architectures
- Understanding framework limitations and workarounds
- Building specialized ML systems for unique domains

### Career Transformation:

**Before TinyTorch:** "I can train models with PyTorch"
**After TinyTorch:** "I can build and optimize ML systems"

You become the person your team asks:
- *"Why is our training bottlenecked?"* 
- *"Can we fit this model in memory?"*
- *"How do we implement this research paper?"*
- *"What's the best architecture for our constraints?"*

---

## Pedagogical Philosophy: Build → Use → Understand

### 1. Build First
Every component implemented from scratch:
- Tensors with broadcasting and memory management
- Automatic differentiation with computational graphs
- Optimizers with state management and memory profiling
- Complete training loops with checkpointing and monitoring

### 2. Use Immediately
No toy examples - recreate ML history with real results:
- **MLP Era**: Train MLPs to 52.7% CIFAR-10 accuracy (the baseline that motivated CNNs)
- **CNN Revolution**: Build LeNet-1 (39.4%) and LeNet-5 (47.5%) - witness the breakthrough
- **Modern CNNs**: Push beyond MLPs with optimized architectures (55%+ achievable)
- **Transformer Era**: Language models using 95% vision framework reuse

### 3. Understand Systems
Connect implementations to production reality:
- How your tensor maps to PyTorch's memory model
- Why your optimizer choices affect GPU utilization
- How your autograd compares to production frameworks
- When your implementations would need modification at scale

### 4. Reflect on Trade-offs
ML Systems Thinking sections in every module:
- Memory vs compute trade-offs in different architectures
- Accuracy vs efficiency considerations for deployment  
- Debugging strategies for common production issues
- Framework design principles and their implications

---

## Unique Value Proposition

### What Makes TinyTorch Different:

**Systems-First Approach**
- Not just "how does attention work" but "why does attention scale O(N²) and how do production systems handle this?"
- Not just "implement SGD" but "when do you choose SGD vs Adam in production?"

**Production Relevance**
- Memory profiling, performance optimization, deployment patterns
- Real datasets, realistic scale, professional development workflow
- Connection to industry practices and framework design decisions

**Framework Generalization**
- 16 modules that build ONE cohesive ML framework supporting vision AND language
- 95% component reuse from computer vision to language models
- Professional package structure with CLI tools and testing

**Proven Pedagogy**
- Build → Use → Understand cycle creates deep intuition
- Immediate testing and feedback for every component
- Progressive complexity with solid foundations
- NBGrader integration for classroom deployment

---

## Learning Outcomes: Becoming an ML Systems Engineer

### Technical Mastery
- **Implement any ML paper** from first principles
- **Debug training issues** at the systems level
- **Optimize models** for production deployment
- **Profile and improve** ML system performance
- **Design custom architectures** for specialized domains
- **Understand framework generalization** across vision and language

### Systems Understanding 
- **Memory management** in ML frameworks
- **Computational complexity** vs real-world performance
- **Hardware utilization** patterns and optimization
- **Distributed training** challenges and solutions
- **Production deployment** considerations and trade-offs

### Professional Skills
- **Test-driven development** for ML systems
- **Performance profiling** and optimization techniques
- **Code organization** and package development
- **Documentation** and API design
- **MLOps** and production monitoring

### Career Impact
- **Technical interviews**: Demonstrate deep ML systems knowledge
- **Job opportunities**: Qualify for ML engineer (not just data scientist) roles
- **Team leadership**: Become the go-to person for ML systems questions
- **Research ability**: Implement cutting-edge papers independently
- **Entrepreneurship**: Build ML products with full-stack understanding

---

## Success Stories: What Students Say

*"Finally understood what happens when I call `loss.backward()` - now I can debug gradient issues instead of just hoping they go away."*

*"Built my own attention mechanism from scratch, then extended my vision framework to language models with 95% component reuse. When GPT-4 came out, I actually understood both the technical details AND the framework unification."*

*"Got hired as an ML engineer specifically because I could explain how optimizers work at the memory level during the technical interview."*

*"Used TinyTorch concepts to optimize our production training pipeline for both vision and language models - saved 40% on cloud costs by understanding memory bottlenecks across modalities."*

*"Implemented a custom loss function for our research project in 30 minutes instead of spending days figuring out PyTorch internals."*

---

## Ready to Become an ML Systems Engineer?

**TinyTorch transforms ML users into ML builders.**

Stop wondering how frameworks work. Start building them.

**[Begin Your Journey →](chapters/00-introduction.md)**

---

*TinyTorch: Because understanding how to build ML systems makes you a more effective ML engineer.*