Files
TinyTorch/site/instructor-guide.md
Vijay Janapa Reddi 7bc4f6f835 Reorganize repository: rename docs/ to site/ for clarity
- Delete outdated site/ directory
- Rename docs/ → site/ to match original architecture intent
- Update all GitHub workflows to reference site/:
  - publish-live.yml: Update paths and build directory
  - publish-dev.yml: Update paths and build directory
  - build-pdf.yml: Update paths and artifact locations
- Update README.md:
  - Consolidate site/ documentation (website + PDF)
  - Update all docs/ links to site/
- Test successful: Local build works with all 40 pages

The site/ directory now clearly represents the course website
and documentation, making the repository structure more intuitive.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 16:31:51 -08:00

16 KiB
Raw Permalink Blame History

👩‍🏫 TinyTorch Instructor Guide

Complete guide for teaching ML Systems Engineering with TinyTorch.

🎯 Course Overview

TinyTorch teaches ML systems engineering through building, not just using. Students construct a complete ML framework from tensors to transformers, understanding memory, performance, and scaling at each step.

🛠️ Instructor Setup

1. Initial Setup

# Clone and setup
git clone https://github.com/MLSysBook/TinyTorch.git
cd TinyTorch

# Virtual environment (MANDATORY)
python -m venv .venv
source .venv/bin/activate

# Install with instructor tools
pip install -r requirements.txt
pip install nbgrader

# Setup grading infrastructure
tito grade setup

2. Verify Installation

tito system health
# Should show all green checkmarks

tito grade
# Should show available grade commands

📝 Assignment Workflow

Simplified with Tito CLI

We've wrapped NBGrader behind simple tito grade commands so you don't need to learn NBGrader's complex interface.

1. Prepare Assignments

# Generate instructor version (with solutions)
tito grade generate 01_tensor

# Create student version (solutions removed)
tito grade release 01_tensor

# Student version will be in: release/tinytorch/01_tensor/

2. Distribute to Students

# Option A: GitHub Classroom (recommended)
# 1. Create assignment repository from TinyTorch
# 2. Remove solutions from modules
# 3. Students clone and work

# Option B: Direct distribution
# Share the release/ directory contents

3. Collect Submissions

# Collect all students
tito grade collect 01_tensor

# Or specific student
tito grade collect 01_tensor --student student_id

4. Auto-Grade

# Grade all submissions
tito grade autograde 01_tensor

# Grade specific student
tito grade autograde 01_tensor --student student_id

5. Manual Review

# Open grading interface (browser-based)
tito grade manual 01_tensor

# This launches a web interface for:
# - Reviewing ML Systems question responses
# - Adding feedback comments
# - Adjusting auto-grades

6. Generate Feedback

# Create feedback files for students
tito grade feedback 01_tensor

7. Export Grades

# Export all grades to CSV
tito grade export

# Or specific module
tito grade export --module 01_tensor --output grades_module01.csv

📊 Grading Components

Auto-Graded (70%)

  • Code implementation correctness
  • Test passing
  • Function signatures
  • Output validation

Manually Graded (30%)

  • ML Systems Thinking questions (3 per module)
  • Each question: 10 points
  • Focus on understanding, not perfection

Grading Rubric for ML Systems Questions

Points Criteria
9-10 Demonstrates deep understanding, references specific code, discusses systems implications
7-8 Good understanding, some code references, basic systems thinking
5-6 Surface understanding, generic response, limited systems perspective
3-4 Attempted but misses key concepts
0-2 No attempt or completely off-topic

What to Look For:

  • References to actual implemented code
  • Memory/performance analysis
  • Scaling considerations
  • Production system comparisons
  • Understanding of trade-offs

📋 Sample Solutions for Grading Calibration

This section provides sample solutions to help calibrate grading standards. Use these as reference points when evaluating student submissions.

Module 01: Tensor - Memory Footprint

Excellent Solution (9-10 points):

def memory_footprint(self):
    """Calculate tensor memory in bytes."""
    return self.data.nbytes

Why Excellent:

  • Concise and correct
  • Uses NumPy's built-in nbytes property
  • Clear docstring
  • Handles all tensor shapes correctly

Good Solution (7-8 points):

def memory_footprint(self):
    """Calculate memory usage."""
    return np.prod(self.data.shape) * self.data.dtype.itemsize

Why Good:

  • Correct implementation
  • Manually calculates (shows understanding)
  • Works but less efficient than using nbytes
  • Minor: docstring could be more specific

Acceptable Solution (5-6 points):

def memory_footprint(self):
    size = 1
    for dim in self.data.shape:
        size *= dim
    return size * 4  # Assumes float32

Why Acceptable:

  • Correct logic but hardcoded dtype size
  • Works for float32 but fails for other dtypes
  • Shows understanding of memory calculation
  • Missing proper dtype handling

Module 05: Autograd - Backward Pass

Excellent Solution (9-10 points):

def backward(self, gradient=None):
    """Backward pass through computational graph."""
    if gradient is None:
        gradient = np.ones_like(self.data)
    
    self.grad = gradient
    
    if self.grad_fn is not None:
        # Compute gradients for inputs
        input_grads = self.grad_fn.backward(gradient)
        
        # Propagate to input tensors
        if isinstance(input_grads, tuple):
            for input_tensor, input_grad in zip(self.grad_fn.inputs, input_grads):
                if input_tensor.requires_grad:
                    input_tensor.backward(input_grad)
        else:
            if self.grad_fn.inputs[0].requires_grad:
                self.grad_fn.inputs[0].backward(input_grads)

Why Excellent:

  • Handles both scalar and tensor gradients
  • Properly checks requires_grad before propagating
  • Handles tuple returns from grad_fn
  • Clear variable names and structure

Good Solution (7-8 points):

def backward(self, gradient=None):
    if gradient is None:
        gradient = np.ones_like(self.data)
    self.grad = gradient
    if self.grad_fn:
        grads = self.grad_fn.backward(gradient)
        for inp, grad in zip(self.grad_fn.inputs, grads):
            inp.backward(grad)

Why Good:

  • Correct logic
  • Missing requires_grad check (minor issue)
  • Assumes grads is always iterable (may fail for single input)
  • Works for most cases but less robust

Acceptable Solution (5-6 points):

def backward(self, grad):
    self.grad = grad
    if self.grad_fn:
        self.grad_fn.inputs[0].backward(self.grad_fn.backward(grad))

Why Acceptable:

  • Basic backward pass works
  • Only handles single input (fails for multi-input operations)
  • Missing None gradient handling
  • Shows understanding but incomplete

Module 09: Spatial - Convolution Implementation

Excellent Solution (9-10 points):

def forward(self, x):
    """Forward pass with explicit loops for clarity."""
    batch_size, in_channels, height, width = x.shape
    out_height = (height - self.kernel_size + 2 * self.padding) // self.stride + 1
    out_width = (width - self.kernel_size + 2 * self.padding) // self.stride + 1
    
    output = np.zeros((batch_size, self.out_channels, out_height, out_width))
    
    # Apply padding
    if self.padding > 0:
        x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding), 
                      (self.padding, self.padding)), mode='constant')
    
    # Explicit convolution loops
    for b in range(batch_size):
        for oc in range(self.out_channels):
            for oh in range(out_height):
                for ow in range(out_width):
                    h_start = oh * self.stride
                    w_start = ow * self.stride
                    h_end = h_start + self.kernel_size
                    w_end = w_start + self.kernel_size
                    
                    window = x[b, :, h_start:h_end, w_start:w_end]
                    output[b, oc, oh, ow] = np.sum(
                        window * self.weight[oc] + self.bias[oc]
                    )
    
    return Tensor(output, requires_grad=x.requires_grad)

Why Excellent:

  • Clear output shape calculation
  • Proper padding handling
  • Explicit loops make O(kernel_size²) complexity visible
  • Correct gradient tracking setup
  • Well-structured and readable

Good Solution (7-8 points):

def forward(self, x):
    B, C, H, W = x.shape
    out_h = (H - self.kernel_size) // self.stride + 1
    out_w = (W - self.kernel_size) // self.stride + 1
    out = np.zeros((B, self.out_channels, out_h, out_w))
    
    for b in range(B):
        for oc in range(self.out_channels):
            for i in range(out_h):
                for j in range(out_w):
                    h = i * self.stride
                    w = j * self.stride
                    out[b, oc, i, j] = np.sum(
                        x[b, :, h:h+self.kernel_size, w:w+self.kernel_size] 
                        * self.weight[oc]
                    ) + self.bias[oc]
    return Tensor(out)

Why Good:

  • Correct implementation
  • Missing padding support (works only for padding=0)
  • Less clear variable names
  • Missing requires_grad propagation

Acceptable Solution (5-6 points):

def forward(self, x):
    out = np.zeros((x.shape[0], self.out_channels, x.shape[2]-2, x.shape[3]-2))
    for b in range(x.shape[0]):
        for c in range(self.out_channels):
            for i in range(out.shape[2]):
                for j in range(out.shape[3]):
                    out[b, c, i, j] = np.sum(x[b, :, i:i+3, j:j+3] * self.weight[c])
    return Tensor(out)

Why Acceptable:

  • Basic convolution works
  • Hardcoded kernel_size=3 (not general)
  • No stride or padding support
  • Shows understanding but incomplete

Module 12: Attention - Scaled Dot-Product Attention

Excellent Solution (9-10 points):

def forward(self, query, key, value, mask=None):
    """Scaled dot-product attention with numerical stability."""
    # Compute attention scores
    scores = np.dot(query, key.T) / np.sqrt(self.d_k)
    
    # Apply mask if provided
    if mask is not None:
        scores = np.where(mask, scores, -1e9)
    
    # Softmax with numerical stability
    exp_scores = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
    attention_weights = exp_scores / np.sum(exp_scores, axis=-1, keepdims=True)
    
    # Apply attention to values
    output = np.dot(attention_weights, value)
    
    return output, attention_weights

Why Excellent:

  • Proper scaling factor (1/√d_k)
  • Numerical stability with max subtraction
  • Mask handling
  • Returns both output and attention weights
  • Clear and well-documented

Good Solution (7-8 points):

def forward(self, q, k, v):
    scores = np.dot(q, k.T) / np.sqrt(q.shape[-1])
    weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)
    return np.dot(weights, v)

Why Good:

  • Correct implementation
  • Missing numerical stability (may overflow)
  • Missing mask support
  • Works but less robust

Acceptable Solution (5-6 points):

def forward(self, q, k, v):
    scores = np.dot(q, k.T)
    weights = np.exp(scores) / np.sum(np.exp(scores))
    return np.dot(weights, v)

Why Acceptable:

  • Basic attention mechanism
  • Missing scaling factor
  • Missing numerical stability
  • Incorrect softmax (should be per-row)

Grading Guidelines Using Sample Solutions

When Evaluating Student Code:

  1. Correctness First: Does it pass all tests?

    • If no: Maximum 6 points (even if well-written)
    • If yes: Proceed to quality evaluation
  2. Code Quality:

    • Excellent (9-10): Production-ready, handles edge cases, well-documented
    • Good (7-8): Correct and functional, minor improvements possible
    • Acceptable (5-6): Works but incomplete or has issues
  3. Systems Thinking:

    • Excellent: Discusses memory, performance, scaling implications
    • Good: Some systems awareness
    • Acceptable: Focuses only on correctness
  4. Common Patterns:

    • Look for: Proper error handling, edge case consideration, documentation
    • Red flags: Hardcoded values, missing checks, unclear variable names

Remember: These are calibration examples. Adjust based on your course level and learning objectives. The goal is consistent evaluation, not perfection.

📚 Module Teaching Notes

Module 01: Tensor

  • Focus: Memory layout, data structures
  • Key Concept: Understanding memory is crucial for ML performance
  • Demo: Show memory profiling, copying behavior

Module 02: Activations

  • Focus: Vectorization, numerical stability
  • Key Concept: Small details matter at scale
  • Demo: Gradient vanishing/exploding

Module 04-05: Layers & Networks

  • Focus: Composition, parameter management
  • Key Concept: Building blocks combine into complex systems
  • Project: Build a small CNN

Module 06-07: Spatial & Attention

  • Focus: Algorithmic complexity, memory patterns
  • Key Concept: O(N²) operations become bottlenecks
  • Demo: Profile attention memory usage

Module 08-11: Training Pipeline

  • Focus: End-to-end system integration
  • Key Concept: Many components must work together
  • Project: Train a real model

Module 12-15: Production

  • Focus: Deployment, optimization, monitoring
  • Key Concept: Academic vs production requirements
  • Demo: Model compression, deployment

Module 16: TinyGPT

  • Focus: Framework generalization
  • Key Concept: 70% component reuse from vision to language
  • Capstone: Build a working language model

🎯 Learning Objectives

By course end, students should be able to:

  1. Build complete ML systems from scratch
  2. Analyze memory usage and computational complexity
  3. Debug performance bottlenecks
  4. Optimize for production deployment
  5. Understand framework design decisions
  6. Apply systems thinking to ML problems

📈 Tracking Progress

Individual Progress

# Check specific student progress
tito checkpoint status --student student_id

Class Overview

# Export all checkpoint achievements
tito checkpoint export --output class_progress.csv

Identify Struggling Students

Look for:

  • Missing checkpoint achievements
  • Low scores on ML Systems questions
  • Incomplete module submissions

💡 Teaching Tips

1. Emphasize Building Over Theory

  • Have students type every line of code
  • Run tests immediately after implementation
  • Break and fix things intentionally

2. Connect to Production Systems

  • Show PyTorch/TensorFlow equivalents
  • Discuss real-world bottlenecks
  • Share production war stories

3. Make Performance Visible

# Use profilers liberally
with TimeProfiler("operation"):
    result = expensive_operation()
    
# Show memory usage
print(f"Memory: {get_memory_usage():.2f} MB")

4. Encourage Systems Questions

  • "What would break at 1B parameters?"
  • "How would you distributed this?"
  • "What's the bottleneck here?"

🔧 Troubleshooting

Common Student Issues

Environment Problems

# Student fix:
tito system health
tito system reset

Module Import Errors

# Rebuild package
tito export --all

Test Failures

# Detailed test output
tito module test MODULE --verbose

NBGrader Issues

Database Locked

# Clear NBGrader database
rm gradebook.db
tito grade setup

Missing Submissions

# Check submission directory
ls submitted/*/MODULE/

📊 Sample Schedule (16 Weeks)

Week Module Focus
1 01 Tensor Data Structures, Memory
2 02 Activations Non-linearity Functions
3 03 Layers Neural Network Components
4 04 Losses Optimization Objectives
5 05 Autograd Automatic Differentiation
6 06 Optimizers Training Algorithms
7 07 Training Complete Training Loop
8 Midterm Project Build and Train Network
9 08 DataLoader Data Pipeline
10 09 Spatial Convolutions, CNNs
11 10 Tokenization Text Processing
12 11 Embeddings Word Representations
13 12 Attention Attention Mechanisms
14 13 Transformers Transformer Architecture
15 14-19 Optimization Profiling, Quantization, etc.
16 20 Capstone Torch Olympics Competition

🎓 Assessment Strategy

Continuous Assessment (70%)

  • Module completion: 4% each × 16 = 64%
  • Checkpoint achievements: 6%

Projects (30%)

  • Midterm: Build and train CNN (15%)
  • Final: Extend TinyGPT (15%)

📚 Additional Resources


Need help? Open an issue or contact the TinyTorch team!