Files
TinyTorch/docs/chapters/milestones.md
Vijay Janapa Reddi c058ab9419 Fix documentation links after site → docs reorganization
- Replace all .html → .md in markdown source files (43 instances)
- Fix broken links: tito-essentials.md → tito/overview.md
- Remove broken links to non-existent leaderboard/olympics-rules pages
- Fix PDF_BUILD_GUIDE reference in website-README.md

Website rebuilt successfully with 46 warnings.

Changes:
- All markdown files now use .md extension for internal links
- Removed references to missing/planned files
- Website builds cleanly and all links are functional
2025-11-28 05:01:44 +01:00

13 KiB
Raw Blame History

Journey Through ML History

Experience the evolution of AI by rebuilding history's most important breakthroughs with YOUR TinyTorch implementations.


What Are Milestones?

Milestones are proof-of-mastery demonstrations that showcase what you can build after completing specific modules. Each milestone recreates a historically significant ML achievement using YOUR implementations.

Why This Approach?

  • Deep Understanding: Experience the actual challenges researchers faced
  • Progressive Learning: Each milestone builds on previous foundations
  • Real Achievements: Not toy examples - these are historically significant breakthroughs
  • Systems Thinking: Understand WHY each innovation mattered for ML systems

Two Dimensions of Your Progress

As you build TinyTorch, you're progressing along TWO dimensions simultaneously:

Pedagogical Dimension (Acts): What You're LEARNING

Act I (01-04): Building atomic components - mathematical foundations Act II (05-07): The gradient revolution - systems that learn Act III (08-09): Real-world complexity - data and scale Act IV (10-13): Sequential intelligence - language understanding Act V (14-19): Production systems - optimization and deployment Act VI (20): Complete integration - unified AI systems

See The Learning Journey for the complete pedagogical narrative explaining WHY modules flow this way.

Historical Dimension (Milestones): What You CAN Build

1957: Perceptron - Binary classification 1969: XOR - Non-linear learning 1986: MLP - Multi-class vision 1998: CNN - Spatial intelligence 2017: Transformers - Language generation 2018: Torch Olympics - Production optimization

How They Connect

graph TB
    subgraph "Pedagogical Acts (What You're Learning)"
        A1["Act I: Foundation<br/>Modules 01-04<br/>Atomic Components"]
        A2["Act II: Learning<br/>Modules 05-07<br/>Gradient Revolution"]
        A3["Act III: Data & Scale<br/>Modules 08-09<br/>Real-World Complexity"]
        A4["Act IV: Language<br/>Modules 10-13<br/>Sequential Intelligence"]
        A5["Act V: Production<br/>Modules 14-19<br/>Optimization"]
        A6["Act VI: Integration<br/>Module 20<br/>Complete Systems"]
    end

    subgraph "Historical Milestones (What You Can Build)"
        M1["1957: Perceptron<br/>Binary Classification"]
        M2["1969: XOR Crisis<br/>Non-linear Learning"]
        M3["1986: MLP<br/>Multi-class Vision<br/>95%+ MNIST"]
        M4["1998: CNN<br/>Spatial Intelligence<br/>75%+ CIFAR-10"]
        M5["2017: Transformers<br/>Language Generation"]
        M6["2018: Torch Olympics<br/>Production Speed"]
    end

    A1 --> M1
    A2 --> M2
    A2 --> M3
    A3 --> M4
    A4 --> M5
    A5 --> M6

    style A1 fill:#e3f2fd
    style A2 fill:#fff8e1
    style A3 fill:#e8f5e9
    style A4 fill:#f3e5f5
    style A5 fill:#fce4ec
    style A6 fill:#fff3e0
    style M1 fill:#ffcdd2
    style M2 fill:#f8bbd0
    style M3 fill:#e1bee7
    style M4 fill:#d1c4e9
    style M5 fill:#c5cae9
    style M6 fill:#bbdefb
Learning Act Unlocked Milestone Proof of Mastery
Act I: Foundation (01-04) 1957 Perceptron Your Linear layer recreates history
Act II: Learning (05-07) 1969 XOR + 1986 MLP Your autograd enables training (95%+ MNIST)
Act III: Data & Scale (08-09) 1998 CNN Your Conv2d achieves 75%+ on CIFAR-10
Act IV: Language (10-13) 2017 Transformers Your attention generates coherent text
Act V: Production (14-18) 2018 Torch Olympics Your optimizations achieve production speed
Act VI: Integration (19-20) Benchmarking + Capstone Your complete framework competes

Understanding Both Dimensions: The Acts explain WHY you're building each component (pedagogical progression). The Milestones prove WHAT you've built works (historical validation). Together, they show you're not just completing exercises - you're building something real.


The Timeline

timeline
    title Journey Through ML History
    1957 : Perceptron : Binary classification with gradient descent
    1969 : XOR Crisis : Hidden layers solve non-linear problems
    1986 : MLP Revival : Backpropagation enables deep learning
    1998 : CNN Era : Spatial intelligence for computer vision
    2017 : Transformers : Attention revolutionizes language AI
    2018 : Torch Olympics : Production benchmarking and optimization

01. Perceptron (1957) - Rosenblatt

After Modules 02-04

Input → Linear → Sigmoid → Output

The Beginning: The first trainable neural network. Frank Rosenblatt proved machines could learn from data.

What You'll Build:

  • Binary classification with gradient descent
  • Simple but revolutionary architecture
  • YOUR Linear layer recreates history

Systems Insights:

  • Memory: O(n) parameters
  • Compute: O(n) operations
  • Limitation: Only linearly separable problems
cd milestones/01_1957_perceptron
python 01_rosenblatt_forward.py   # See the problem (random weights)
python 02_rosenblatt_trained.py   # See the solution (trained)

Expected Results: ~50% (untrained) → 95%+ (trained) accuracy


02. XOR Crisis (1969) - Minsky & Papert

After Modules 02-06

Input → Linear → ReLU → Linear → Output

The Challenge: Minsky proved perceptrons couldn't solve XOR. This crisis nearly ended AI research.

What You'll Build:

  • Hidden layers enable non-linear solutions
  • Multi-layer networks break through limitations
  • YOUR autograd makes it possible

Systems Insights:

  • Memory: O(n²) with hidden layers
  • Compute: O(n²) operations
  • Breakthrough: Hidden representations
cd milestones/02_1969_xor
python 01_xor_crisis.py   # Watch it fail (loss stuck at 0.69)
python 02_xor_solved.py   # Hidden layers solve it!

Expected Results: 50% (single layer) → 100% (multi-layer) on XOR


03. MLP Revival (1986) - Backpropagation Era

After Modules 02-08

Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes

The Revolution: Backpropagation enabled training deep networks on real datasets like MNIST.

What You'll Build:

  • Multi-class digit recognition
  • Complete training pipelines
  • YOUR optimizers achieve 95%+ accuracy

Systems Insights:

  • Memory: ~100K parameters for MNIST
  • Compute: Dense matrix operations
  • Architecture: Multi-layer feature learning
cd milestones/03_1986_mlp
python 01_rumelhart_tinydigits.py  # 8x8 digits (quick)
python 02_rumelhart_mnist.py       # Full MNIST

Expected Results: 95%+ accuracy on MNIST


04. CNN Revolution (1998) - LeCun's Breakthrough

After Modules 02-09🎯 North Star Achievement

Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes

The Game-Changer: CNNs exploit spatial structure for computer vision. This enabled modern AI.

What You'll Build:

  • Convolutional feature extraction
  • Natural image classification (CIFAR-10)
  • YOUR Conv2d + MaxPool2d unlock spatial intelligence

Systems Insights:

  • Memory: ~1M parameters (weight sharing reduces vs dense)
  • Compute: Convolution is intensive but parallelizable
  • Architecture: Local connectivity + translation invariance
cd milestones/04_1998_cnn
python 01_lecun_tinydigits.py  # Spatial features on digits
python 02_lecun_cifar10.py     # CIFAR-10 @ 75%+ accuracy

Expected Results: 75%+ accuracy on CIFAR-10


05. Transformer Era (2017) - Attention Revolution

After Modules 02-13

Tokens → Embeddings → Attention → FFN → ... → Attention → Output

The Modern Era: Transformers + attention launched the LLM revolution (GPT, BERT, ChatGPT).

What You'll Build:

  • Self-attention mechanisms
  • Autoregressive text generation
  • YOUR attention implementation generates language

Systems Insights:

  • Memory: O(n²) attention requires careful management
  • Compute: Highly parallelizable
  • Architecture: Long-range dependencies
cd milestones/05_2017_transformer
python 01_vaswani_generation.py  # Q&A generation with TinyTalks
python 02_vaswani_dialogue.py    # Multi-turn dialogue

Expected Results: Loss < 1.5, coherent responses to questions


06. Torch Olympics Era (2018) - The Optimization Revolution

After Modules 14-18

Profile → Compress → Accelerate

The Turning Point: As models grew larger, MLCommons' Torch Olympics (2018) established systematic optimization as a discipline - profiling, compression, and acceleration became essential for deployment.

What You'll Build:

  • Performance profiling and bottleneck analysis
  • Model compression (quantization + pruning)
  • Inference acceleration (KV-cache + batching)

Systems Insights:

  • Memory: 4-16× compression through quantization/pruning
  • Speed: 12-40× faster generation with KV-cache + batching
  • Workflow: Systematic "measure → optimize → validate" methodology
cd milestones/06_2018_mlperf
python 01_baseline_profile.py   # Find bottlenecks
python 02_compression.py         # Reduce size (quantize + prune)
python 03_generation_opts.py    # Speed up inference (cache + batch)

Expected Results: 8-16× smaller models, 12-40× faster inference


Learning Philosophy

Progressive Capability Building

Stage Era Capability Your Tools
1957 Foundation Binary classification Linear + Sigmoid
1969 Depth Non-linear problems Hidden layers + Autograd
1986 Scale Multi-class vision Optimizers + Training
1998 Structure Spatial understanding Conv2d + Pooling
2017 Attention Sequence modeling Transformers + Attention
2018 Optimization Production deployment Profiling + Compression + Acceleration

Systems Engineering Progression

Each milestone teaches critical systems thinking:

  1. Memory Management: From O(n) → O(n²) → O(n²) with optimizations
  2. Computational Trade-offs: Accuracy vs efficiency
  3. Architectural Patterns: How structure enables capability
  4. Production Deployment: What it takes to scale

How to Use Milestones

1. Complete Prerequisites

# Check which modules you've completed
tito checkpoint status

# Complete required modules
tito module complete 02_tensor
tito module complete 03_activations
# ... and so on

2. Run the Milestone

cd milestones/01_1957_perceptron
python 02_rosenblatt_trained.py

3. Understand the Systems

Each milestone includes:

  • 📊 Memory profiling: See actual memory usage
  • Performance metrics: FLOPs, parameters, timing
  • 🧠 Architectural analysis: Why this design matters
  • 📈 Scaling insights: How performance changes with size

4. Reflect and Compare

Questions to ask:

  • How does this compare to modern architectures?
  • What were the computational constraints in that era?
  • How would you optimize this for production?
  • What patterns appear in PyTorch/TensorFlow?

Quick Reference

Milestone Prerequisites

Milestone After Module Key Requirements
01. Perceptron (1957) 04 Tensor, Activations, Layers
02. XOR (1969) 06 + Losses, Autograd
03. MLP (1986) 08 + Optimizers, Training
04. CNN (1998) 09 + Spatial, DataLoader
05. Transformer (2017) 13 + Tokenization, Embeddings, Attention
06. Torch Olympics (2018) 18 + Profiling, Quantization, Compression, Memoization, Acceleration

What Each Milestone Proves

  • Your implementations work - Not just toy code
  • Historical significance - These breakthroughs shaped modern AI
  • Systems understanding - You know memory, compute, scaling
  • Production relevance - Patterns used in real ML frameworks

Further Learning

After completing milestones, explore:

  • Torch Olympics Competition: Optimize your implementations
  • Leaderboard: Compare with other students
  • Capstone Projects: Build your own ML applications
  • Research Papers: Read the original papers for each milestone

Why This Matters

Most courses teach you to USE frameworks.
TinyTorch teaches you to UNDERSTAND them.

By rebuilding ML history, you gain:

  • 🧠 Deep intuition for how neural networks work
  • 🔧 Systems thinking for production ML
  • 🏆 Portfolio projects demonstrating mastery
  • 💼 Preparation for ML systems engineering roles

Ready to start your journey through ML history?

cd milestones/01_1957_perceptron
python 02_rosenblatt_trained.py

Build the future by understanding the past. 🚀