mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-11 21:33:33 -05:00

Files

Vijay Janapa Reddi d2cfb2d57e docs: Major cleanup - 46 → 12 essential docs

MASSIVE DOCUMENTATION CLEANUP:
- Reduced from 46 docs to 12 essential files
- Archived 34 outdated planning and analysis documents

✅ KEPT (Essential for current operations):
- STUDENT_QUICKSTART.md - Student onboarding
- INSTRUCTOR_GUIDE.md - Instructor setup
- cifar10-training-guide.md - North star achievement
- tinytorch-assumptions.md - Complexity framework (NEW)
- tinytorch-textbook-alignment.md - Academic alignment

- NBGrader integration docs (3 files)
- Development standards (3 files)
- docs/README.md - Navigation guide (NEW)

🗑️ ARCHIVED (Completed/outdated planning):
- All optimization-modules-* planning docs
- All milestone-* system docs
- All tutorial-master-plan and analysis docs
- Module reordering and structure analysis
- Agent setup and workflow case studies

RESULT: Clean, focused documentation structure
Only active, current docs remain - easy to find what you need!

2025-09-27 17:04:19 -04:00

4.7 KiB

Raw Blame History

TinyTorch Module Audit: Essential vs Extra Components

Overview

This audit examines what components are NEEDED for each milestone vs EXTRA components that enhance the framework but aren't strictly necessary.

Part I: MLPs (Target: XORNet)

Module 02: Tensor

ESSENTIAL for XORNet:

Basic Tensor class with data storage
Addition, subtraction, multiplication
Matrix multiply (for layers)
Shape, reshape operations

EXTRA (but good for framework):

Broadcasting ✓ (nice but XOR doesn't need)
Fancy indexing ✓
Statistical operations (mean, sum, std) ✓
Comparison operators ✓

Module 03: Activations

ESSENTIAL for XORNet:

ReLU ✓ (used in XORNet)
Sigmoid (could use for XOR output)

EXTRA (but good for framework):

Tanh ✓ (alternative to ReLU)
Softmax ✓ (not needed for XOR, but needed for CIFAR-10)
ActivationProfiler ✓ (pedagogical tool)

Module 04: Layers

ESSENTIAL for XORNet:

Dense layer ✓ (fully connected)
Weight initialization
Forward pass

EXTRA:

Different initialization strategies (Xavier, He, etc.)
Bias option

Module 05: Networks

ESSENTIAL for XORNet:

Sequential model ✓
Forward pass through layers

EXTRA:

Model summary/printing
Parameter counting

Part II: CNNs (Target: CIFAR-10)

Module 06: Spatial

ESSENTIAL for CNN CIFAR-10:

Conv2D ✓ (the key innovation!)
MaxPool2D ✓ (for downsampling)

EXTRA (but pedagogically valuable):

Different padding modes
Stride options
AvgPool2D (alternative pooling)
Multiple filter support

Module 07: DataLoader

ESSENTIAL for CIFAR-10:

CIFAR10Dataset ✓
DataLoader with batching ✓
Shuffling ✓

EXTRA:

Data augmentation (but helps accuracy!)
Other datasets (MNIST, etc.)
Prefetching/parallel loading

Module 08: Autograd

ESSENTIAL for CIFAR-10:

Variable class ✓
Backward pass ✓
Gradient computation ✓

EXTRA:

Computation graph visualization
Gradient checking
Higher-order derivatives

Module 09: Optimizers

ESSENTIAL for CIFAR-10:

SGD (basic, could work)
Adam ✓ (used in CIFAR-10, converges faster)

EXTRA:

Learning rate scheduling
Momentum variants
RMSprop, AdaGrad

Module 10: Training

ESSENTIAL for CIFAR-10:

Training loop ✓
CrossEntropyLoss ✓
Basic evaluation ✓

EXTRA (but very useful):

Checkpointing ✓
Early stopping ✓
Metrics tracking ✓
Validation splits ✓
MeanSquaredError (for XOR)

Part III: Transformers (Target: TinyGPT)

Module 11: Embeddings

ESSENTIAL for TinyGPT:

Token embedding layer
Positional encoding (sinusoidal or learned)

EXTRA:

Multiple embedding types
Embedding dropout

Module 12: Attention

ESSENTIAL for TinyGPT:

Multi-head attention ✓ (already implemented!)
Scaled dot-product attention ✓
Causal masking ✓

EXTRA:

Different attention variants
Attention visualization

Module 13: Normalization

ESSENTIAL for TinyGPT:

LayerNorm (critical for transformer stability)

EXTRA:

BatchNorm (not used in transformers)
GroupNorm, InstanceNorm

Module 14: Transformers

ESSENTIAL for TinyGPT:

TransformerBlock (attention + FFN + residual)
Positional encoding integration
Stack of blocks

EXTRA:

Encoder-decoder architecture
Cross-attention

Module 15: Generation

ESSENTIAL for TinyGPT:

Autoregressive generation
Temperature sampling
Greedy decoding

EXTRA:

Beam search
Top-k, Top-p sampling
Repetition penalty

Summary

Truly Minimal Path

If we wanted ONLY what's needed for milestones:

XORNet: Just needs Dense, ReLU, basic Tensor ops
CIFAR-10 MLP: Add DataLoader, Adam, CrossEntropyLoss
CIFAR-10 CNN: Add Conv2D, MaxPool2D
TinyGPT: Add Embeddings, Attention, LayerNorm, Generation

What We Have (Good Extras)

More activation choices: Good for experimentation
Better optimizers: Adam converges faster than SGD
Training utilities: Checkpointing, metrics (very practical!)
Profiling tools: Help understand performance

Missing Essentials

For Part III (TinyGPT) we still need to implement:

Module 11: Embedding layer, positional encoding
Module 13: LayerNorm
Module 14: TransformerBlock
Module 15: Generation strategies

Verdict

The current modules have a good balance of essential + useful extras. The extras are:

Pedagogically valuable (show alternatives)
Practically useful (checkpointing, better optimizers)
Framework completeness (makes TinyTorch feel real)

The only "bloat" might be multiple activation functions, but even those are good for showing students the options and tradeoffs.

4.7 KiB Raw Blame History

TinyTorch Module Audit: Essential vs Extra Components

Overview

Part I: MLPs (Target: XORNet)

Module 02: Tensor

Module 03: Activations

Module 04: Layers

Module 05: Networks

Part II: CNNs (Target: CIFAR-10)

Module 06: Spatial

Module 07: DataLoader

Module 08: Autograd

Module 09: Optimizers

Module 10: Training

Part III: Transformers (Target: TinyGPT)

Module 11: Embeddings

Module 12: Attention

Module 13: Normalization

Module 14: Transformers

Module 15: Generation

Summary

Truly Minimal Path

What We Have (Good Extras)

Missing Essentials

Verdict

4.7 KiB

Raw Blame History