Files
TinyTorch/book/vision.md
Vijay Janapa Reddi a21a006603 feat: Major book structure and content updates
- Reorganized chapter structure with new numbering system
- Added new chapters: introduction, tokenization, embeddings, profiling, quantization, caching
- Removed obsolete chapters (15-mlops) and consolidated content
- Updated table of contents and navigation structure
- Enhanced visual design with new logos and favicon
- Added comprehensive documentation (FAQ, user manual, command reference, competitions)
- Improved theme design and custom CSS styling
- Added QUICKSTART.md for rapid onboarding
- Updated all chapter cross-references and links
2025-09-27 01:36:16 -04:00

7.3 KiB
Raw Blame History

The TinyTorch Vision

Training ML Systems Engineers: From Computer Vision to Language Models


The Problem We're Solving

The ML field has a critical gap: most education teaches you to use frameworks, not build them.

Traditional ML Education:

import torch
import torch.nn as nn
model = nn.Linear(784, 10)
optimizer = torch.optim.Adam(model.parameters())

Questions students can't answer:

  • Why does Adam use 3× more memory than SGD?
  • How does loss.backward() actually compute gradients?
  • When should you use gradient accumulation vs larger batch sizes?
  • Why do attention mechanisms limit context length?

The TinyTorch Difference:

class Linear:
    def __init__(self, in_features, out_features):
        self.weight = Tensor(np.random.randn(in_features, out_features))
        self.bias = Tensor(np.zeros(out_features))
    
    def forward(self, x):
        return x @ self.weight + self.bias  # YOU implemented @
    
    def backward(self, grad_output):
        # YOU understand exactly how gradients flow
        self.weight.grad = x.T @ grad_output
        return grad_output @ self.weight.T

Questions students CAN answer:

  • Exactly how automatic differentiation works
  • Why certain optimizers use more memory
  • How to debug training instability
  • When to make performance vs accuracy trade-offs

What We Teach: Systems Thinking

Beyond Algorithms: System-Level Understanding

Memory Management:

  • Why Adam needs 3× parameter memory (parameters + momentum + variance)
  • How attention matrices scale O(N²) with sequence length
  • When gradient accumulation saves memory vs compute trade-offs

Performance Analysis:

  • Why naive convolution is 100× slower than optimized versions
  • How cache misses destroy performance in matrix operations
  • When vectorization provides 10-100× speedups

Production Trade-offs:

  • SGD vs Adam: convergence speed vs memory constraints
  • Gradient checkpointing: trading compute for memory
  • Mixed precision: 2× memory savings with accuracy considerations

Hardware Awareness:

  • How memory bandwidth limits ML performance
  • Why GPU utilization matters more than peak FLOPS
  • When distributed training becomes necessary

Target Audience: Future ML Systems Engineers

Perfect For:

Computer Science Students

  • Going beyond "use PyTorch" to "understand PyTorch"
  • Building portfolio projects that demonstrate deep system knowledge
  • Preparing for ML engineering roles (not just data science)

Software Engineers → ML Engineers

  • Leveraging existing programming skills for ML systems
  • Understanding performance, debugging, and optimization
  • Learning production ML patterns and infrastructure

ML Practitioners

  • Moving from model users to model builders
  • Debugging training issues at the systems level
  • Optimizing models for production deployment

Researchers & Advanced Users

  • Implementing custom operations and architectures
  • Understanding framework limitations and workarounds
  • Building specialized ML systems for unique domains

Career Transformation:

Before TinyTorch: "I can train models with PyTorch" After TinyTorch: "I can build and optimize ML systems"

You become the person your team asks:

  • "Why is our training bottlenecked?"
  • "Can we fit this model in memory?"
  • "How do we implement this research paper?"
  • "What's the best architecture for our constraints?"

Pedagogical Philosophy: Build → Use → Understand

1. Build First

Every component implemented from scratch:

  • Tensors with broadcasting and memory management
  • Automatic differentiation with computational graphs
  • Optimizers with state management and memory profiling
  • Complete training loops with checkpointing and monitoring

2. Use Immediately

No toy examples - recreate ML history with real results:

  • MLP Era: Train MLPs to 52.7% CIFAR-10 accuracy (the baseline that motivated CNNs)
  • CNN Revolution: Build LeNet-1 (39.4%) and LeNet-5 (47.5%) - witness the breakthrough
  • Modern CNNs: Push beyond MLPs with optimized architectures (75%+ achievable)
  • Transformer Era: Language models using 95% vision framework reuse

3. Understand Systems

Connect implementations to production reality:

  • How your tensor maps to PyTorch's memory model
  • Why your optimizer choices affect GPU utilization
  • How your autograd compares to production frameworks
  • When your implementations would need modification at scale

4. Reflect on Trade-offs

ML Systems Thinking sections in every module:

  • Memory vs compute trade-offs in different architectures
  • Accuracy vs efficiency considerations for deployment
  • Debugging strategies for common production issues
  • Framework design principles and their implications

Unique Value Proposition

What Makes TinyTorch Different:

Systems-First Approach

  • Not just "how does attention work" but "why does attention scale O(N²) and how do production systems handle this?"
  • Not just "implement SGD" but "when do you choose SGD vs Adam in production?"

Production Relevance

  • Memory profiling, performance optimization, deployment patterns
  • Real datasets, realistic scale, professional development workflow
  • Connection to industry practices and framework design decisions

Framework Generalization

  • 20 modules that build ONE cohesive ML framework supporting vision AND language
  • 95% component reuse from computer vision to language models
  • Professional package structure with CLI tools and testing

Proven Pedagogy

  • Build → Use → Understand cycle creates deep intuition
  • Immediate testing and feedback for every component
  • Progressive complexity with solid foundations
  • NBGrader integration for classroom deployment

Learning Outcomes: Becoming an ML Systems Engineer

Technical Mastery

  • Implement any ML paper from first principles
  • Debug training issues at the systems level
  • Optimize models for production deployment
  • Profile and improve ML system performance
  • Design custom architectures for specialized domains
  • Understand framework generalization across vision and language

Systems Understanding

  • Memory management in ML frameworks
  • Computational complexity vs real-world performance
  • Hardware utilization patterns and optimization
  • Distributed training challenges and solutions
  • Production deployment considerations and trade-offs

Professional Skills

  • Test-driven development for ML systems
  • Performance profiling and optimization techniques
  • Code organization and package development
  • Documentation and API design
  • MLOps and production monitoring

Career Impact

  • Technical interviews: Demonstrate deep ML systems knowledge
  • Job opportunities: Qualify for ML engineer (not just data scientist) roles
  • Team leadership: Become the go-to person for ML systems questions
  • Research ability: Implement cutting-edge papers independently
  • Entrepreneurship: Build ML products with full-stack understanding

Ready to Become an ML Systems Engineer?

TinyTorch transforms ML users into ML builders.

Stop wondering how frameworks work. Start building them.

Begin Your Journey →


TinyTorch: Because understanding how to build ML systems makes you a more effective ML engineer.