mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-29 02:17:32 -05:00

Files

Vijay Janapa Reddi c058ab9419 Fix documentation links after site → docs reorganization

- Replace all .html → .md in markdown source files (43 instances)
- Fix broken links: tito-essentials.md → tito/overview.md
- Remove broken links to non-existent leaderboard/olympics-rules pages
- Fix PDF_BUILD_GUIDE reference in website-README.md

Website rebuilt successfully with 46 warnings.

Changes:
- All markdown files now use .md extension for internal links
- Removed references to missing/planned files
- Website builds cleanly and all links are functional

2025-11-28 05:01:44 +01:00

13 KiB

Raw Blame History

Journey Through ML History

Experience the evolution of AI by rebuilding history's most important breakthroughs with YOUR TinyTorch implementations.

What Are Milestones?

Milestones are proof-of-mastery demonstrations that showcase what you can build after completing specific modules. Each milestone recreates a historically significant ML achievement using YOUR implementations.

Why This Approach?

Deep Understanding: Experience the actual challenges researchers faced
Progressive Learning: Each milestone builds on previous foundations
Real Achievements: Not toy examples - these are historically significant breakthroughs
Systems Thinking: Understand WHY each innovation mattered for ML systems

Two Dimensions of Your Progress

As you build TinyTorch, you're progressing along TWO dimensions simultaneously:

Pedagogical Dimension (Acts): What You're LEARNING

Act I (01-04): Building atomic components - mathematical foundations Act II (05-07): The gradient revolution - systems that learn Act III (08-09): Real-world complexity - data and scale Act IV (10-13): Sequential intelligence - language understanding Act V (14-19): Production systems - optimization and deployment Act VI (20): Complete integration - unified AI systems

See The Learning Journey for the complete pedagogical narrative explaining WHY modules flow this way.

Historical Dimension (Milestones): What You CAN Build

1957: Perceptron - Binary classification 1969: XOR - Non-linear learning 1986: MLP - Multi-class vision 1998: CNN - Spatial intelligence 2017: Transformers - Language generation 2018: Torch Olympics - Production optimization

How They Connect

graph TB
    subgraph "Pedagogical Acts (What You're Learning)"
        A1["Act I: Foundation<br/>Modules 01-04<br/>Atomic Components"]
        A2["Act II: Learning<br/>Modules 05-07<br/>Gradient Revolution"]
        A3["Act III: Data & Scale<br/>Modules 08-09<br/>Real-World Complexity"]
        A4["Act IV: Language<br/>Modules 10-13<br/>Sequential Intelligence"]
        A5["Act V: Production<br/>Modules 14-19<br/>Optimization"]
        A6["Act VI: Integration<br/>Module 20<br/>Complete Systems"]
    end

    subgraph "Historical Milestones (What You Can Build)"
        M1["1957: Perceptron<br/>Binary Classification"]
        M2["1969: XOR Crisis<br/>Non-linear Learning"]
        M3["1986: MLP<br/>Multi-class Vision<br/>95%+ MNIST"]
        M4["1998: CNN<br/>Spatial Intelligence<br/>75%+ CIFAR-10"]
        M5["2017: Transformers<br/>Language Generation"]
        M6["2018: Torch Olympics<br/>Production Speed"]
    end

    A1 --> M1
    A2 --> M2
    A2 --> M3
    A3 --> M4
    A4 --> M5
    A5 --> M6

    style A1 fill:#e3f2fd
    style A2 fill:#fff8e1
    style A3 fill:#e8f5e9
    style A4 fill:#f3e5f5
    style A5 fill:#fce4ec
    style A6 fill:#fff3e0
    style M1 fill:#ffcdd2
    style M2 fill:#f8bbd0
    style M3 fill:#e1bee7
    style M4 fill:#d1c4e9
    style M5 fill:#c5cae9
    style M6 fill:#bbdefb

Learning Act	Unlocked Milestone	Proof of Mastery
Act I: Foundation (01-04)	1957 Perceptron	Your Linear layer recreates history
Act II: Learning (05-07)	1969 XOR + 1986 MLP	Your autograd enables training (95%+ MNIST)
Act III: Data & Scale (08-09)	1998 CNN	Your Conv2d achieves 75%+ on CIFAR-10
Act IV: Language (10-13)	2017 Transformers	Your attention generates coherent text
Act V: Production (14-18)	2018 Torch Olympics	Your optimizations achieve production speed
Act VI: Integration (19-20)	Benchmarking + Capstone	Your complete framework competes

Understanding Both Dimensions: The Acts explain WHY you're building each component (pedagogical progression). The Milestones prove WHAT you've built works (historical validation). Together, they show you're not just completing exercises - you're building something real.

The Timeline

timeline
    title Journey Through ML History
    1957 : Perceptron : Binary classification with gradient descent
    1969 : XOR Crisis : Hidden layers solve non-linear problems
    1986 : MLP Revival : Backpropagation enables deep learning
    1998 : CNN Era : Spatial intelligence for computer vision
    2017 : Transformers : Attention revolutionizes language AI
    2018 : Torch Olympics : Production benchmarking and optimization

01. Perceptron (1957) - Rosenblatt

After Modules 02-04

Input → Linear → Sigmoid → Output

The Beginning: The first trainable neural network. Frank Rosenblatt proved machines could learn from data.

What You'll Build:

Binary classification with gradient descent
Simple but revolutionary architecture
YOUR Linear layer recreates history

Systems Insights:

Memory: O(n) parameters
Compute: O(n) operations
Limitation: Only linearly separable problems

cd milestones/01_1957_perceptron
python 01_rosenblatt_forward.py   # See the problem (random weights)
python 02_rosenblatt_trained.py   # See the solution (trained)

Expected Results: ~50% (untrained) → 95%+ (trained) accuracy

02. XOR Crisis (1969) - Minsky & Papert

After Modules 02-06

Input → Linear → ReLU → Linear → Output

The Challenge: Minsky proved perceptrons couldn't solve XOR. This crisis nearly ended AI research.

What You'll Build:

Hidden layers enable non-linear solutions
Multi-layer networks break through limitations
YOUR autograd makes it possible

Systems Insights:

Memory: O(n²) with hidden layers
Compute: O(n²) operations
Breakthrough: Hidden representations

cd milestones/02_1969_xor
python 01_xor_crisis.py   # Watch it fail (loss stuck at 0.69)
python 02_xor_solved.py   # Hidden layers solve it!

Expected Results: 50% (single layer) → 100% (multi-layer) on XOR

03. MLP Revival (1986) - Backpropagation Era

After Modules 02-08

Images → Flatten → Linear → ReLU → Linear → ReLU → Linear → Classes

The Revolution: Backpropagation enabled training deep networks on real datasets like MNIST.

What You'll Build:

Multi-class digit recognition
Complete training pipelines
YOUR optimizers achieve 95%+ accuracy

Systems Insights:

Memory: ~100K parameters for MNIST
Compute: Dense matrix operations
Architecture: Multi-layer feature learning

cd milestones/03_1986_mlp
python 01_rumelhart_tinydigits.py  # 8x8 digits (quick)
python 02_rumelhart_mnist.py       # Full MNIST

Expected Results: 95%+ accuracy on MNIST

04. CNN Revolution (1998) - LeCun's Breakthrough

After Modules 02-09 • 🎯 North Star Achievement

Images → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Linear → Classes

The Game-Changer: CNNs exploit spatial structure for computer vision. This enabled modern AI.

What You'll Build:

Convolutional feature extraction
Natural image classification (CIFAR-10)
YOUR Conv2d + MaxPool2d unlock spatial intelligence

Systems Insights:

Memory: ~1M parameters (weight sharing reduces vs dense)
Compute: Convolution is intensive but parallelizable
Architecture: Local connectivity + translation invariance

cd milestones/04_1998_cnn
python 01_lecun_tinydigits.py  # Spatial features on digits
python 02_lecun_cifar10.py     # CIFAR-10 @ 75%+ accuracy

Expected Results: 75%+ accuracy on CIFAR-10 ✨

05. Transformer Era (2017) - Attention Revolution

After Modules 02-13

Tokens → Embeddings → Attention → FFN → ... → Attention → Output

The Modern Era: Transformers + attention launched the LLM revolution (GPT, BERT, ChatGPT).

What You'll Build:

Self-attention mechanisms
Autoregressive text generation
YOUR attention implementation generates language

Systems Insights:

Memory: O(n²) attention requires careful management
Compute: Highly parallelizable
Architecture: Long-range dependencies

cd milestones/05_2017_transformer
python 01_vaswani_generation.py  # Q&A generation with TinyTalks
python 02_vaswani_dialogue.py    # Multi-turn dialogue

Expected Results: Loss < 1.5, coherent responses to questions

06. Torch Olympics Era (2018) - The Optimization Revolution

After Modules 14-18

Profile → Compress → Accelerate

The Turning Point: As models grew larger, MLCommons' Torch Olympics (2018) established systematic optimization as a discipline - profiling, compression, and acceleration became essential for deployment.

What You'll Build:

Performance profiling and bottleneck analysis
Model compression (quantization + pruning)
Inference acceleration (KV-cache + batching)

Systems Insights:

Memory: 4-16× compression through quantization/pruning
Speed: 12-40× faster generation with KV-cache + batching
Workflow: Systematic "measure → optimize → validate" methodology

cd milestones/06_2018_mlperf
python 01_baseline_profile.py   # Find bottlenecks
python 02_compression.py         # Reduce size (quantize + prune)
python 03_generation_opts.py    # Speed up inference (cache + batch)

Expected Results: 8-16× smaller models, 12-40× faster inference

Learning Philosophy

Progressive Capability Building

Stage	Era	Capability	Your Tools
1957	Foundation	Binary classification	Linear + Sigmoid
1969	Depth	Non-linear problems	Hidden layers + Autograd
1986	Scale	Multi-class vision	Optimizers + Training
1998	Structure	Spatial understanding	Conv2d + Pooling
2017	Attention	Sequence modeling	Transformers + Attention
2018	Optimization	Production deployment	Profiling + Compression + Acceleration

Systems Engineering Progression

Each milestone teaches critical systems thinking:

Memory Management: From O(n) → O(n²) → O(n²) with optimizations
Computational Trade-offs: Accuracy vs efficiency
Architectural Patterns: How structure enables capability
Production Deployment: What it takes to scale

How to Use Milestones

1. Complete Prerequisites

# Check which modules you've completed
tito checkpoint status

# Complete required modules
tito module complete 02_tensor
tito module complete 03_activations
# ... and so on

2. Run the Milestone

cd milestones/01_1957_perceptron
python 02_rosenblatt_trained.py

3. Understand the Systems

Each milestone includes:

📊 Memory profiling: See actual memory usage
⚡ Performance metrics: FLOPs, parameters, timing
🧠 Architectural analysis: Why this design matters
📈 Scaling insights: How performance changes with size

4. Reflect and Compare

Questions to ask:

How does this compare to modern architectures?
What were the computational constraints in that era?
How would you optimize this for production?
What patterns appear in PyTorch/TensorFlow?

Quick Reference

Milestone Prerequisites

Milestone	After Module	Key Requirements
01. Perceptron (1957)	04	Tensor, Activations, Layers
02. XOR (1969)	06	+ Losses, Autograd
03. MLP (1986)	08	+ Optimizers, Training
04. CNN (1998)	09	+ Spatial, DataLoader
05. Transformer (2017)	13	+ Tokenization, Embeddings, Attention
06. Torch Olympics (2018)	18	+ Profiling, Quantization, Compression, Memoization, Acceleration

What Each Milestone Proves

Your implementations work - Not just toy code
Historical significance - These breakthroughs shaped modern AI
Systems understanding - You know memory, compute, scaling
Production relevance - Patterns used in real ML frameworks

Further Learning

After completing milestones, explore:

Torch Olympics Competition: Optimize your implementations
Leaderboard: Compare with other students
Capstone Projects: Build your own ML applications
Research Papers: Read the original papers for each milestone

Why This Matters

Most courses teach you to USE frameworks.
TinyTorch teaches you to UNDERSTAND them.

By rebuilding ML history, you gain:

🧠 Deep intuition for how neural networks work
🔧 Systems thinking for production ML
🏆 Portfolio projects demonstrating mastery
💼 Preparation for ML systems engineering roles

Ready to start your journey through ML history?

cd milestones/01_1957_perceptron
python 02_rosenblatt_trained.py

Build the future by understanding the past. 🚀

13 KiB Raw Blame History Unescape Escape

Journey Through ML History

What Are Milestones?

Why This Approach?

Two Dimensions of Your Progress

Pedagogical Dimension (Acts): What You're LEARNING

Historical Dimension (Milestones): What You CAN Build

How They Connect

The Timeline

01. Perceptron (1957) - Rosenblatt

02. XOR Crisis (1969) - Minsky & Papert

03. MLP Revival (1986) - Backpropagation Era

04. CNN Revolution (1998) - LeCun's Breakthrough

05. Transformer Era (2017) - Attention Revolution

06. Torch Olympics Era (2018) - The Optimization Revolution

Learning Philosophy

Progressive Capability Building

Systems Engineering Progression

How to Use Milestones

1. Complete Prerequisites

2. Run the Milestone

3. Understand the Systems

4. Reflect and Compare

Quick Reference

Milestone Prerequisites

What Each Milestone Proves

Further Learning

Why This Matters

13 KiB

Raw Blame History