Files
TinyTorch/modules/DEFINITIVE_MODULE_PLAN.md
Vijay Janapa Reddi 5a08d9cfd3 Complete TinyTorch module rebuild with explanations and milestone testing
Major Accomplishments:
• Rebuilt all 20 modules with comprehensive explanations before each function
• Fixed explanatory placement: detailed explanations before implementations, brief descriptions before tests
• Enhanced all modules with ASCII diagrams for visual learning
• Comprehensive individual module testing and validation
• Created milestone directory structure with working examples
• Fixed critical Module 01 indentation error (methods were outside Tensor class)

Module Status:
 Modules 01-07: Fully working (Tensor → Training pipeline)
 Milestone 1: Perceptron - ACHIEVED (95% accuracy on 2D data)
 Milestone 2: MLP - ACHIEVED (complete training with autograd)
⚠️ Modules 08-20: Mixed results (import dependencies need fixes)

Educational Impact:
• Students can now learn complete ML pipeline from tensors to training
• Clear progression: basic operations → neural networks → optimization
• Explanatory sections provide proper context before implementation
• Working milestones demonstrate practical ML capabilities

Next Steps:
• Fix import dependencies in advanced modules (9, 11, 12, 17-20)
• Debug timeout issues in modules 14, 15
• First 7 modules provide solid foundation for immediate educational use(https://claude.ai/code)
2025-09-29 20:55:55 -04:00

17 KiB
Raw Blame History

TinyTorch Definitive Module Plan

🎯 Overview

19 modules building to 5 milestones, teaching ML systems through implementation.

📚 Module Specifications

Module 01: Tensor

Learning Objective: Can I create and manipulate the building blocks of ML?

Implementation Requirements:

class Tensor:
    """Educational tensor that grows with student knowledge."""

    def __init__(self, data, requires_grad=False):
        self.data = np.array(data)
        self.shape = self.data.shape

        # Gradient features (dormant until Module 05)
        self.requires_grad = requires_grad
        self.grad = None

    def __add__(self, other): return Tensor(self.data + other.data)
    def __mul__(self, other): return Tensor(self.data * other.data)
    def matmul(self, other): return Tensor(np.dot(self.data, other.data))
    def reshape(self, *shape): return Tensor(self.data.reshape(shape))
    def transpose(self, dim0, dim1): # Implement transpose
    def sum(self, axis=None): return Tensor(self.data.sum(axis=axis))

    def backward(self):
        """Compute gradients (implemented in Module 05)."""
        pass  # Students: ignore until Module 05

Student Introduction:

We're building a Tensor class that will grow throughout the course.
For now, focus on: data, shape, and operations.
Ignore for now: requires_grad, grad, backward() (we'll use them in Module 05)

Dependencies: None Export: #| default_exp core.tensor Tests: Shape manipulation, broadcasting, matmul correctness Systems Focus: Memory layout, broadcasting overhead, matmul complexity O(n³)


Module 02: Activations

Learning Objective: Can I add nonlinearity - the key to neural network intelligence?

Implementation Requirements:

class Sigmoid:
    def forward(self, x: Tensor) -> Tensor
    def backward(self, grad: Tensor) -> Tensor  # Stub until Module 05

class ReLU:
    def forward(self, x: Tensor) -> Tensor
    def backward(self, grad: Tensor) -> Tensor  # Stub until Module 05

class GELU:  # For GPT later
    def forward(self, x: Tensor) -> Tensor
    def backward(self, grad: Tensor) -> Tensor  # Stub until Module 05

Dependencies: Module 01 (Tensor) Export: #| default_exp core.activations Tests: Output ranges, gradient shapes (once implemented) Systems Focus: ReLU sparsity benefits, sigmoid saturation, GELU approximations


Module 03: Layers

Learning Objective: Can I build the fundamental building blocks of neural networks?

Implementation Requirements:

class Linear:
    def __init__(self, in_features, out_features, bias=True):
        self.weight = Tensor(randn(in_features, out_features))
        self.bias = Tensor(zeros(out_features)) if bias else None

    def forward(self, x: Tensor) -> Tensor
    def parameters(self) -> List[Tensor]

class Sequential:
    def __init__(self, *layers)
    def forward(self, x: Tensor) -> Tensor
    def parameters(self) -> List[Tensor]

class Dropout:
    def __init__(self, p=0.5)
    def forward(self, x: Tensor, training=True) -> Tensor

Dependencies: Modules 01-02 Export: #| default_exp core.layers Tests: Shape preservation, parameter counting Systems Focus: Weight initialization (Xavier/He), memory per layer


Module 04: Losses

Learning Objective: Can I measure how wrong my model is?

Implementation Requirements:

class CrossEntropyLoss:
    def forward(self, logits: Tensor, targets: Tensor) -> Tensor
    def backward(self) -> Tensor  # Stub until Module 05

class MSELoss:
    def forward(self, predictions: Tensor, targets: Tensor) -> Tensor
    def backward(self) -> Tensor  # Stub until Module 05

def log_softmax(x: Tensor, dim=-1) -> Tensor  # Numerical stability

Dependencies: Modules 01-03 Export: #| default_exp core.losses Tests: Numerical stability, correct loss values Systems Focus: Log-sum-exp trick, memory efficient computation


🪜 Milestone 1: Perceptron (After Module 04)

Location: milestones/01_perceptron/ Deliverable: Train Linear + Sigmoid on 2D dataset, visualize decision boundary Success Criteria: 95% accuracy on linearly separable data Unlock: Complete modules 01-04 + integration test


Module 05: Autograd

Learning Objective: Can I automatically compute gradients for learning?

Implementation Requirements:

# Activate the dormant gradient features in Tensor
# No new Tensor class - enhance existing one!

def implement_backward_for_tensor():
    """Fill in the Tensor.backward() method"""
    # Track computation graph
    # Compute gradients via chain rule
    # Update tensor.grad attributes

class Function:
    """Base class for differentiable operations"""
    def forward(self, *inputs)
    def backward(self, grad_output)

# Wrap existing operations to track gradients
class AddBackward(Function): ...
class MulBackward(Function): ...
class MatmulBackward(Function): ...

Dependencies: Modules 01-04 (enhances Tensor from Module 01) Export: #| default_exp core.autograd Tests: Gradient correctness, chain rule, graph building Systems Focus: Graph memory growth, gradient checkpointing


Module 06: Optimizers

Learning Objective: Can I optimize neural networks with sophisticated algorithms?

Implementation Requirements:

class Optimizer:
    def __init__(self, params)
    def zero_grad(self)
    def step(self)

class SGD(Optimizer):
    def __init__(self, params, lr=0.01, momentum=0.9)

class AdamW(Optimizer):
    def __init__(self, params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)

Dependencies: Modules 01-05 (uses gradients from Module 05) Export: #| default_exp core.optimizers Tests: Parameter updates, momentum accumulation Systems Focus: Adam's 3× memory usage, momentum vs adaptive


Module 07: Training

Learning Objective: Can I build complete training loops for end-to-end learning?

Implementation Requirements:

class Trainer:
    def __init__(self, model, optimizer, loss_fn)
    def train_epoch(self, dataloader)
    def evaluate(self, dataloader)
    def save_checkpoint(self, path)
    def load_checkpoint(self, path)

class CosineSchedule:
    def get_lr(self, epoch)

def clip_grad_norm(parameters, max_norm)

Dependencies: Modules 01-06 Export: #| default_exp core.training Tests: Training loop, checkpointing, scheduling Systems Focus: Batch size vs memory, gradient accumulation


🪜 Milestone 2: MLP (After Module 07)

Location: milestones/02_mlp/ Deliverable: 2-layer MLP on MNIST, compare to perceptron Success Criteria: >95% accuracy on MNIST Unlock: Complete modules 05-07 + integration test


Module 08: DataLoader

Learning Objective: Can I efficiently load and batch data for training?

Implementation Requirements:

class Dataset:
    def __len__(self)
    def __getitem__(self, idx)

class DataLoader:
    def __init__(self, dataset, batch_size, shuffle=False)
    def __iter__(self)
    def __len__(self)

class TensorDataset(Dataset):
    def __init__(self, *tensors)

def download_mnist() -> Tuple[Dataset, Dataset]
def download_cifar10() -> Tuple[Dataset, Dataset]

Dependencies: Modules 01-07 Export: #| default_exp data.loader Tests: Batching, shuffling, iteration Systems Focus: Memory mapping, prefetching, data pipeline


Module 09: Spatial

Learning Objective: Can I process spatial data like images with convolutions?

Implementation Requirements:

class Conv2d:
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0)
    def forward(self, x: Tensor) -> Tensor
    def parameters(self) -> List[Tensor]

class MaxPool2d:
    def __init__(self, kernel_size, stride=None)
    def forward(self, x: Tensor) -> Tensor

class BatchNorm2d:
    def __init__(self, num_features)
    def forward(self, x: Tensor, training=True) -> Tensor

Dependencies: Modules 01-08 Export: #| default_exp core.spatial Tests: Output shapes, receptive fields Systems Focus: Convolution complexity O(N²M²K²), im2col memory trade-off, depthwise separable


🪜 Milestone 3: CNN (After Module 09)

Location: milestones/03_cnn/ Deliverable: 3-layer CNN on CIFAR-10, visualize filters Success Criteria: >75% accuracy on CIFAR-10 Unlock: Complete modules 08-09 + integration test


Module 10: Tokenization

Learning Objective: Can I convert text into numerical representations?

Implementation Requirements:

class Tokenizer:
    def encode(self, text: str) -> List[int]
    def decode(self, tokens: List[int]) -> str

class CharTokenizer(Tokenizer):
    def __init__(self, vocab: List[str])
    def build_vocab(self, corpus: List[str])

class BPETokenizer(Tokenizer):  # Optional/advanced
    def train(self, corpus: List[str], vocab_size: int)

Dependencies: Module 01 Export: #| default_exp text.tokenization Tests: Encode/decode round-trip, vocabulary building Systems Focus: Vocab size vs sequence length trade-off


Module 11: Embeddings

Learning Objective: Can I create learnable representations of discrete tokens?

Implementation Requirements:

class Embedding:
    def __init__(self, vocab_size, embed_dim)
    def forward(self, indices: Tensor) -> Tensor
    def parameters(self) -> List[Tensor]

class PositionalEncoding:
    def __init__(self, max_seq_len, embed_dim)
    def forward(self, x: Tensor) -> Tensor

def create_sinusoidal_embeddings(max_seq_len, embed_dim) -> Tensor

Dependencies: Modules 01-10 Export: #| default_exp text.embeddings Tests: Embedding lookup, position encoding Systems Focus: Embedding table memory, learned vs fixed


Module 12: Attention

Learning Objective: Can I build attention mechanisms for sequence understanding?

Implementation Requirements:

def scaled_dot_product_attention(Q, K, V, mask=None) -> Tensor

class MultiHeadAttention:
    def __init__(self, embed_dim, num_heads)
    def forward(self, x: Tensor, mask=None) -> Tensor
    def parameters(self) -> List[Tensor]

Dependencies: Modules 01-11 Export: #| default_exp core.attention Tests: Attention weights sum to 1, masking Systems Focus: O(n²) memory complexity with sequence length, FlashAttention concepts


Module 13: Transformers

Learning Objective: Can I build complete transformer architectures?

Implementation Requirements:

class TransformerBlock:
    def __init__(self, embed_dim, num_heads, mlp_ratio=4):
        self.attention = MultiHeadAttention(embed_dim, num_heads)
        self.mlp = MLP(embed_dim, embed_dim * mlp_ratio)
        self.ln1 = LayerNorm(embed_dim)
        self.ln2 = LayerNorm(embed_dim)

    def forward(self, x: Tensor) -> Tensor

class GPT:
    def __init__(self, vocab_size, embed_dim, num_layers, num_heads)
    def forward(self, indices: Tensor) -> Tensor
    def generate(self, prompt: Tensor, max_length: int) -> Tensor

Dependencies: Modules 01-12 Export: #| default_exp models.transformer Tests: Shape preservation, generation Systems Focus: Parameter scaling, activation memory


Module 14: KV Caching

Learning Objective: Can I optimize autoregressive generation?

Implementation Requirements:

class KVCache:
    def __init__(self, batch_size, max_seq_len, num_layers, num_heads, head_dim)
    def update(self, layer_idx, key, value, seq_pos)
    def get(self, layer_idx) -> Tuple[Tensor, Tensor]

# Modified attention to use cache
def attention_with_cache(Q, K, V, cache, layer_idx, seq_pos) -> Tensor

Dependencies: Modules 01-13 Export: #| default_exp generation.kv_cache Tests: Cache correctness, memory usage Systems Focus: Cache memory vs recomputation trade-off


🪜 Milestone 4: TinyGPT (After Module 14)

Location: milestones/04_tinygpt/ Deliverable: Character-level GPT on Shakespeare, generate text Success Criteria: Perplexity < 2.0, coherent generation Unlock: Complete modules 10-14 + integration test


Module 15: Profiling

Learning Objective: Can I measure what matters in ML systems?

Implementation Requirements:

class Profiler:
    def count_parameters(self, model) -> int
    def count_flops(self, model, input_shape) -> int
    def measure_memory(self, model, input_shape) -> Dict[str, float]
    def measure_latency(self, model, input, warmup=10, iterations=100) -> float

def profile_forward_pass(model, input) -> Dict[str, Any]
def profile_backward_pass(model, input, loss_fn) -> Dict[str, Any]

Dependencies: All previous Export: #| default_exp profiling.profiler Tests: Accurate counting, timing consistency Systems Focus: FLOPs vs runtime, roofline model


Module 16: Acceleration

Learning Objective: Can I make models run faster?

Implementation Requirements:

# Vectorization examples
def vectorized_matmul(a: Tensor, b: Tensor) -> Tensor
def fused_gelu(x: Tensor) -> Tensor  # Fuse operations

class MixedPrecisionTrainer:
    def __init__(self, model, optimizer, loss_scale=1024)
    def train_step(self, batch)
    def scale_loss(self, loss)

Dependencies: All previous Export: #| default_exp optimization.acceleration Tests: Speedup measurement, numerical stability Systems Focus: Compute intensity, bandwidth limits


Module 17: Quantization

Learning Objective: Can I reduce model precision without breaking it?

Implementation Requirements:

def quantize_int8(tensor: Tensor) -> Tuple[Tensor, float, int]:
    """Return quantized tensor, scale, zero_point"""

class QuantizedLinear:
    def __init__(self, linear_layer: Linear)
    def forward(self, x: Tensor) -> Tensor

def quantize_model(model) -> None:
    """In-place quantization of all Linear layers"""

Dependencies: All previous Export: #| default_exp optimization.quantization Tests: Accuracy preservation, actual memory reduction Systems Focus: Quantization error, INT8 vs FP16


Module 18: Compression

Learning Objective: Can I make models smaller?

Implementation Requirements:

def magnitude_prune(model, sparsity=0.9):
    """Remove weights below threshold"""

def structured_prune(model, prune_ratio=0.5):
    """Remove entire channels/neurons"""

def measure_sparsity(model) -> float:
    """Calculate percentage of zero weights"""

Dependencies: All previous Export: #| default_exp optimization.compression Tests: Sparsity achieved, model still works Systems Focus: Structured vs unstructured, lottery ticket


Module 19: Benchmarking

Learning Objective: Can I fairly compare different approaches?

Implementation Requirements:

class Benchmark:
    def __init__(self, models: List, datasets: List, metrics: List[str])
    def run(self) -> pd.DataFrame
    def plot_results(self)
    def generate_report(self) -> str

def compare_models(model1, model2, test_data) -> Dict[str, float]
def plot_pareto_frontier(results: pd.DataFrame)

Dependencies: All previous Export: #| default_exp benchmarking.benchmark Tests: Metric calculation, report generation Systems Focus: Latency vs throughput, energy efficiency


🪜 Milestone 5: Systems Capstone (After Module 19)

Location: milestones/05_systems_capstone/ Deliverable: Profile and optimize CNN vs TinyGPT

  • Apply quantization and pruning
  • Generate comparison report
  • Show accuracy vs speed trade-offs Success Criteria: 2× speedup with <5% accuracy loss Unlock: Complete modules 15-19 + integration test

📋 Implementation Checklist for Module Developer

For EACH Module:

Setup:

  • Create modules/XX_name/name_dev.py
  • Add jupytext headers
  • Add export directive (#| default_exp)

Implementation:

  • Follow API specs exactly
  • Use ONLY prior modules
  • Include dormant features in Module 01
  • NO monkey-patching ever

Testing:

  • Unit tests after each function
  • Integration test at module end
  • Test in isolation (only prior deps)

Systems Analysis:

  • Memory profiling (if appropriate)
  • Complexity analysis
  • Production comparison

Documentation:

  • Clear student introduction
  • Explain dormant features properly
  • NBGrader metadata

Validation:

  • Run test_module()
  • Export with tito module complete XX
  • Verify checkpoint passes

🚀 Implementation Order

  1. Phase 1: Modules 01-04 → Milestone 1 (Perceptron)
  2. Phase 2: Modules 05-07 → Milestone 2 (MLP)
  3. Phase 3: Modules 08-09 → Milestone 3 (CNN)
  4. Phase 4: Modules 10-14 → Milestone 4 (TinyGPT)
  5. Phase 5: Modules 15-19 → Milestone 5 (Systems)

🎯 Critical Design Decisions

1. Single Tensor Class

  • Module 01 creates Tensor with dormant gradient features
  • Module 05 activates these features (no new class!)
  • No Variable class, no monkey-patching

2. Progressive Dependencies

  • Each module uses ONLY previous modules
  • No forward references allowed
  • Tests work at each stage

3. Milestone Structure

  • Separate milestones/ directory
  • Unlocked after module groups complete
  • Colab-compatible notebooks

4. Systems Focus

  • Every module includes performance analysis
  • Memory profiling where appropriate
  • Production context comparisons

This is the complete, definitive plan for TinyTorch development.