Files
cs249r_book/tinytorch/INSTRUCTOR.md
2026-01-23 14:00:43 -05:00

581 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 👩‍🏫 TinyTorch Instructor Guide
Complete guide for teaching ML Systems Engineering with TinyTorch.
## 🎯 Course Overview
TinyTorch teaches ML systems engineering through building, not just using. Students construct a complete ML framework from tensors to transformers, understanding memory, performance, and scaling at each step.
## 🛠️ Instructor Setup
### **1. Initial Setup**
```bash
# Clone and setup
git clone https://github.com/harvard-edge/cs249r_book.git
cd TinyTorch
# Virtual environment (MANDATORY)
python -m venv .venv
source .venv/bin/activate
# Install with instructor tools
pip install -r requirements.txt
pip install nbgrader
# Setup grading infrastructure
tito nbgrader init
```
### **2. Verify Installation**
```bash
tito system health
# Should show all green checkmarks
tito nbgrader
# Should show available NBGrader commands
```
## 📝 Assignment Workflow
### **⚠️ Experimental Feature**
The NBGrader integration is under active development. Use for testing only.
### **Using NBGrader via Tito**
We provide `tito nbgrader` commands for grading workflows.
### **1. Prepare Assignments**
```bash
# Generate instructor version (with solutions)
tito nbgrader generate 01_tensor
# Create student version (solutions removed)
tito nbgrader release 01_tensor
# Student version will be in: assignments/release/01_tensor/
```
### **2. Distribute to Students**
```bash
# Option A: GitHub Classroom (recommended)
# 1. Create assignment repository from TinyTorch
# 2. Remove solutions from modules
# 3. Students clone and work
# Option B: Direct distribution
# Share the release/ directory contents
```
### **3. Collect Submissions**
```bash
# Collect all students
tito nbgrader collect 01_tensor
# Or specific student
tito nbgrader collect 01_tensor --student student_id
```
### **4. Auto-Grade**
```bash
# Grade all submissions
tito nbgrader autograde 01_tensor
# Grade specific student
tito nbgrader autograde 01_tensor --student student_id
```
### **5. Manual Review**
```bash
# Use NBGrader's formgrader for manual review
# This launches a web interface for:
# - Reviewing ML Systems question responses
# - Adding feedback comments
# - Adjusting auto-grades
nbgrader formgrader
```
### **6. Generate Feedback**
```bash
# Create feedback files for students
tito nbgrader feedback 01_tensor
```
### **7. Export Grades**
```bash
# Export grades report
tito nbgrader report
# Or specific module
tito nbgrader report --module 01_tensor
```
## 📊 Grading Components
### **Auto-Graded (70%)**
- Code implementation correctness
- Test passing
- Function signatures
- Output validation
### **Manually Graded (30%)**
- ML Systems Thinking questions (3 per module)
- Each question: 10 points
- Focus on understanding, not perfection
### **Grading Rubric for ML Systems Questions**
| Points | Criteria |
|--------|----------|
| 9-10 | Demonstrates deep understanding, references specific code, discusses systems implications |
| 7-8 | Good understanding, some code references, basic systems thinking |
| 5-6 | Surface understanding, generic response, limited systems perspective |
| 3-4 | Attempted but misses key concepts |
| 0-2 | No attempt or completely off-topic |
**What to Look For:**
- References to actual implemented code
- Memory/performance analysis
- Scaling considerations
- Production system comparisons
- Understanding of trade-offs
## 📋 Sample Solutions for Grading Calibration
This section provides sample solutions to help calibrate grading standards. Use these as reference points when evaluating student submissions.
### Module 01: Tensor - Memory Footprint
**Excellent Solution (9-10 points)**:
```python
def memory_footprint(self):
"""Calculate tensor memory in bytes."""
return self.data.nbytes
```
**Why Excellent**:
- Concise and correct
- Uses NumPy's built-in `nbytes` property
- Clear docstring
- Handles all tensor shapes correctly
**Good Solution (7-8 points)**:
```python
def memory_footprint(self):
"""Calculate memory usage."""
return np.prod(self.data.shape) * self.data.dtype.itemsize
```
**Why Good**:
- Correct implementation
- Manually calculates (shows understanding)
- Works but less efficient than using `nbytes`
- Minor: docstring could be more specific
**Acceptable Solution (5-6 points)**:
```python
def memory_footprint(self):
size = 1
for dim in self.data.shape:
size *= dim
return size * 4 # Assumes float32
```
**Why Acceptable**:
- Correct logic but hardcoded dtype size
- Works for float32 but fails for other dtypes
- Shows understanding of memory calculation
- Missing proper dtype handling
### Module 06: Autograd - Backward Pass
**Excellent Solution (9-10 points)**:
```python
def backward(self, gradient=None):
"""Backward pass through computational graph."""
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn is not None:
# Compute gradients for inputs
input_grads = self.grad_fn.backward(gradient)
# Propagate to input tensors
if isinstance(input_grads, tuple):
for input_tensor, input_grad in zip(self.grad_fn.inputs, input_grads):
if input_tensor.requires_grad:
input_tensor.backward(input_grad)
else:
if self.grad_fn.inputs[0].requires_grad:
self.grad_fn.inputs[0].backward(input_grads)
```
**Why Excellent**:
- Handles both scalar and tensor gradients
- Properly checks `requires_grad` before propagating
- Handles tuple returns from grad_fn
- Clear variable names and structure
**Good Solution (7-8 points)**:
```python
def backward(self, gradient=None):
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn:
grads = self.grad_fn.backward(gradient)
for inp, grad in zip(self.grad_fn.inputs, grads):
inp.backward(grad)
```
**Why Good**:
- Correct logic
- Missing `requires_grad` check (minor issue)
- Assumes grads is always iterable (may fail for single input)
- Works for most cases but less robust
**Acceptable Solution (5-6 points)**:
```python
def backward(self, grad):
self.grad = grad
if self.grad_fn:
self.grad_fn.inputs[0].backward(self.grad_fn.backward(grad))
```
**Why Acceptable**:
- Basic backward pass works
- Only handles single input (fails for multi-input operations)
- Missing None gradient handling
- Shows understanding but incomplete
### Module 09: Spatial - Convolution Implementation
**Excellent Solution (9-10 points)**:
```python
def forward(self, x):
"""Forward pass with explicit loops for clarity."""
batch_size, in_channels, height, width = x.shape
out_height = (height - self.kernel_size + 2 * self.padding) // self.stride + 1
out_width = (width - self.kernel_size + 2 * self.padding) // self.stride + 1
output = np.zeros((batch_size, self.out_channels, out_height, out_width))
# Apply padding
if self.padding > 0:
x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding),
(self.padding, self.padding)), mode='constant')
# Explicit convolution loops
for b in range(batch_size):
for oc in range(self.out_channels):
for oh in range(out_height):
for ow in range(out_width):
h_start = oh * self.stride
w_start = ow * self.stride
h_end = h_start + self.kernel_size
w_end = w_start + self.kernel_size
window = x[b, :, h_start:h_end, w_start:w_end]
output[b, oc, oh, ow] = np.sum(
window * self.weight[oc] + self.bias[oc]
)
return Tensor(output, requires_grad=x.requires_grad)
```
**Why Excellent**:
- Clear output shape calculation
- Proper padding handling
- Explicit loops make O(kernel_size²) complexity visible
- Correct gradient tracking setup
- Well-structured and readable
**Good Solution (7-8 points)**:
```python
def forward(self, x):
B, C, H, W = x.shape
out_h = (H - self.kernel_size) // self.stride + 1
out_w = (W - self.kernel_size) // self.stride + 1
out = np.zeros((B, self.out_channels, out_h, out_w))
for b in range(B):
for oc in range(self.out_channels):
for i in range(out_h):
for j in range(out_w):
h = i * self.stride
w = j * self.stride
out[b, oc, i, j] = np.sum(
x[b, :, h:h+self.kernel_size, w:w+self.kernel_size]
* self.weight[oc]
) + self.bias[oc]
return Tensor(out)
```
**Why Good**:
- Correct implementation
- Missing padding support (works only for padding=0)
- Less clear variable names
- Missing requires_grad propagation
**Acceptable Solution (5-6 points)**:
```python
def forward(self, x):
out = np.zeros((x.shape[0], self.out_channels, x.shape[2]-2, x.shape[3]-2))
for b in range(x.shape[0]):
for c in range(self.out_channels):
for i in range(out.shape[2]):
for j in range(out.shape[3]):
out[b, c, i, j] = np.sum(x[b, :, i:i+3, j:j+3] * self.weight[c])
return Tensor(out)
```
**Why Acceptable**:
- Basic convolution works
- Hardcoded kernel_size=3 (not general)
- No stride or padding support
- Shows understanding but incomplete
### Module 12: Attention - Scaled Dot-Product Attention
**Excellent Solution (9-10 points)**:
```python
def forward(self, query, key, value, mask=None):
"""Scaled dot-product attention with numerical stability."""
# Compute attention scores
scores = np.dot(query, key.T) / np.sqrt(self.d_k)
# Apply mask if provided
if mask is not None:
scores = np.where(mask, scores, -1e9)
# Softmax with numerical stability
exp_scores = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
attention_weights = exp_scores / np.sum(exp_scores, axis=-1, keepdims=True)
# Apply attention to values
output = np.dot(attention_weights, value)
return output, attention_weights
```
**Why Excellent**:
- Proper scaling factor (1/√d_k)
- Numerical stability with max subtraction
- Mask handling
- Returns both output and attention weights
- Clear and well-documented
**Good Solution (7-8 points)**:
```python
def forward(self, q, k, v):
scores = np.dot(q, k.T) / np.sqrt(q.shape[-1])
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)
return np.dot(weights, v)
```
**Why Good**:
- Correct implementation
- Missing numerical stability (may overflow)
- Missing mask support
- Works but less robust
**Acceptable Solution (5-6 points)**:
```python
def forward(self, q, k, v):
scores = np.dot(q, k.T)
weights = np.exp(scores) / np.sum(np.exp(scores))
return np.dot(weights, v)
```
**Why Acceptable**:
- Basic attention mechanism
- Missing scaling factor
- Missing numerical stability
- Incorrect softmax (should be per-row)
### Grading Guidelines Using Sample Solutions
**When Evaluating Student Code**:
1. **Correctness First**: Does it pass all tests?
- If no: Maximum 6 points (even if well-written)
- If yes: Proceed to quality evaluation
2. **Code Quality**:
- **Excellent (9-10)**: Production-ready, handles edge cases, well-documented
- **Good (7-8)**: Correct and functional, minor improvements possible
- **Acceptable (5-6)**: Works but incomplete or has issues
3. **Systems Thinking**:
- **Excellent**: Discusses memory, performance, scaling implications
- **Good**: Some systems awareness
- **Acceptable**: Focuses only on correctness
4. **Common Patterns**:
- Look for: Proper error handling, edge case consideration, documentation
- Red flags: Hardcoded values, missing checks, unclear variable names
**Remember**: These are calibration examples. Adjust based on your course level and learning objectives. The goal is consistent evaluation, not perfection.
## 📚 Module Teaching Notes
### **Module 01: Tensor**
- **Focus**: Memory layout, data structures
- **Key Concept**: Understanding memory is crucial for ML performance
- **Demo**: Show memory profiling, copying behavior
### **Module 02: Activations**
- **Focus**: Vectorization, numerical stability
- **Key Concept**: Small details matter at scale
- **Demo**: Gradient vanishing/exploding
### **Module 04-05: Layers & Networks**
- **Focus**: Composition, parameter management
- **Key Concept**: Building blocks combine into complex systems
- **Project**: Build a small CNN
### **Module 06-07: Spatial & Attention**
- **Focus**: Algorithmic complexity, memory patterns
- **Key Concept**: O(N²) operations become bottlenecks
- **Demo**: Profile attention memory usage
### **Module 08-11: Training Pipeline**
- **Focus**: End-to-end system integration
- **Key Concept**: Many components must work together
- **Project**: Train a real model
### **Module 12-15: Production**
- **Focus**: Deployment, optimization, monitoring
- **Key Concept**: Academic vs production requirements
- **Demo**: Model compression, deployment
### **Module 16: TinyGPT**
- **Focus**: Framework generalization
- **Key Concept**: 70% component reuse from vision to language
- **Capstone**: Build a working language model
## 🎯 Learning Objectives
By course end, students should be able to:
1. **Build** complete ML systems from scratch
2. **Analyze** memory usage and computational complexity
3. **Debug** performance bottlenecks
4. **Optimize** for production deployment
5. **Understand** framework design decisions
6. **Apply** systems thinking to ML problems
## 📈 Tracking Progress
### **Individual Progress**
```bash
# Check specific student progress
tito module status --student student_id
```
### **Class Overview**
```bash
# Export all module progress
tito module status --export class_progress.csv
```
### **Identify Struggling Students**
Look for:
- Missing module completions
- Low scores on ML Systems questions
- Incomplete module submissions
## 💡 Teaching Tips
### **1. Emphasize Building Over Theory**
- Have students type every line of code
- Run tests immediately after implementation
- Break and fix things intentionally
### **2. Connect to Production Systems**
- Show PyTorch/TensorFlow equivalents
- Discuss real-world bottlenecks
- Share production war stories
### **3. Make Performance Visible**
```python
# Use profilers liberally
with TimeProfiler("operation"):
result = expensive_operation()
# Show memory usage
print(f"Memory: {get_memory_usage():.2f} MB")
```
### **4. Encourage Systems Questions**
- "What would break at 1B parameters?"
- "How would you distribute this?"
- "What's the bottleneck here?"
## 🔧 Troubleshooting
### **Common Student Issues**
**Environment Problems**
```bash
# Student fix:
tito system health
tito module reset XX # Reset specific module if needed
```
**Module Import Errors**
```bash
# Rebuild package
tito export --all
```
**Test Failures**
```bash
# Detailed test output
tito module test MODULE --verbose
```
### **NBGrader Issues**
**Database Locked**
```bash
# Clear NBGrader database
rm gradebook.db
tito nbgrader init
```
**Missing Submissions**
```bash
# Check submission directory
ls submitted/*/MODULE/
```
## 📊 Sample Schedule (16 Weeks)
| Week | Module | Focus |
|------|--------|-------|
| 1 | 01 Tensor | Data Structures, Memory |
| 2 | 02 Activations | Non-linearity Functions |
| 3 | 03 Layers | Neural Network Components |
| 4 | 04 Losses | Optimization Objectives |
| 5 | 05 DataLoader | Data Pipeline |
| 6 | 06 Autograd | Automatic Differentiation |
| 7 | 07 Optimizers | Training Algorithms |
| 8 | 08 Training | Complete Training Loop |
| 9 | Midterm Project | Build and Train Network |
| 10 | 09 Spatial | Convolutions, CNNs |
| 11 | 10 Tokenization | Text Processing |
| 12 | 11 Embeddings | Word Representations |
| 13 | 12 Attention | Attention Mechanisms |
| 14 | 13 Transformers | Transformer Architecture |
| 15 | 14-19 Optimization | Profiling, Quantization, etc. |
| 16 | 20 Capstone | Torch Olympics Competition |
## 🎓 Assessment Strategy
### **Continuous Assessment (70%)**
- Module completion: 4% each × 16 = 64%
- Checkpoint achievements: 6%
### **Projects (30%)**
- Midterm: Build and train CNN (15%)
- Final: Extend TinyGPT (15%)
## 📚 Additional Resources
- [MLSys Book](https://mlsysbook.ai) - Companion textbook
- [Course Discussions](https://github.com/harvard-edge/cs249r_book/discussions)
- [Issue Tracker](https://github.com/harvard-edge/cs249r_book/issues)
---
**Need help? Open an issue or contact the TinyTorch team!**