mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-30 10:13:57 -05:00
🎯 NORTH STAR VISION DOCUMENTED: 'Don't Just Import It, Build It' - Training AI Engineers, not just ML users AI Engineering emerges as a foundational discipline like Computer Engineering, bridging algorithms and systems to build the AI infrastructure of the future. 🧪 ROBUST TESTING FRAMEWORK ESTABLISHED: - Created tests/regression/ for sandbox integrity tests - Implemented test-driven bug prevention workflow - Clear separation: student tests (pedagogical) vs system tests (robustness) - Every bug becomes a test to prevent recurrence ✅ KEY IMPLEMENTATIONS: - NORTH_STAR.md: Vision for AI Engineering discipline - Testing best practices: Focus on robust student sandbox - Git workflow standards: Professional development practices - Regression test suite: Prevent infrastructure issues - Conv->Linear dimension tests (found CNN bug) - Transformer reshaping tests (found GPT bug) 🏗️ SANDBOX INTEGRITY: Students need a solid, predictable environment where they focus on ML concepts, not debugging framework issues. The framework must be invisible. 📚 EDUCATIONAL PHILOSOPHY: TinyTorch isn't just teaching a framework - it's founding the AI Engineering discipline by training engineers who understand how to BUILD ML systems. This establishes the foundation for training the first generation of true AI Engineers who will define this emerging discipline.
6.3 KiB
6.3 KiB
TinyTorch Module Development Standards
🎯 Core Principle
Modules teach ML systems engineering through building, not just ML algorithms through reading.
📁 File Structure
One Module = One .py File
modules/XX_modulename/
├── modulename_dev.py # The ONLY file you edit
├── modulename_dev.ipynb # Auto-generated from .py (DO NOT EDIT)
└── README.md # Module overview
Critical Rules:
- ✅ ALWAYS edit
.pyfiles only - ❌ NEVER edit
.ipynbnotebooks directly - ✅ Use jupytext to sync .py → .ipynb
📚 Module Structure Pattern
Every module MUST follow this exact structure:
# %% [markdown]
"""
# Module XX: [Name]
**Learning Objectives:**
- Build [component] from scratch
- Understand [systems concept]
- Analyze performance implications
"""
# %% [markdown]
"""
## Part 1: Mathematical Foundations
[Theory and complexity analysis]
"""
# %% [code]
# Implementation
# %% [markdown]
"""
### Testing [Component]
Let's verify our implementation works correctly.
"""
# %% [code]
# Immediate test
# %% [markdown]
"""
## Part 2: Systems Analysis
### Memory Profiling
Let's understand the memory implications.
"""
# %% [code]
# Memory profiling code
# %% [markdown]
"""
## Part 3: Production Context
In real ML systems like PyTorch...
"""
# ... continue pattern ...
# %% [code]
if __name__ == "__main__":
run_all_tests()
# %% [markdown]
"""
## 🤔 ML Systems Thinking
[Interactive questions analyzing implementation]
"""
# %% [markdown]
"""
## 🎯 Module Summary
[What was learned - ALWAYS LAST]
"""
🧪 Implementation → Test Pattern
MANDATORY: Every implementation must be immediately followed by a test.
# ✅ CORRECT Pattern:
# %% [markdown]
"""
## Building the Dense Layer
"""
# %% [code]
class Dense:
def __init__(self, in_features, out_features):
self.weights = np.random.randn(in_features, out_features) * 0.1
self.bias = np.zeros(out_features)
def forward(self, x):
return x @ self.weights + self.bias
# %% [markdown]
"""
### Testing Dense Layer
Let's verify our dense layer handles shapes correctly.
"""
# %% [code]
def test_dense_layer():
layer = Dense(10, 5)
x = np.random.randn(32, 10) # Batch of 32, 10 features
output = layer.forward(x)
assert output.shape == (32, 5), f"Expected (32, 5), got {output.shape}"
print("✅ Dense layer forward pass works!")
test_dense_layer()
🔬 ML Systems Focus
MANDATORY Systems Analysis Sections
Every module MUST include:
- Complexity Analysis
# %% [markdown]
"""
### Computational Complexity
- Matrix multiply: O(batch × in_features × out_features)
- Memory usage: O(in_features × out_features) for weights
- This becomes the bottleneck when...
"""
- Memory Profiling
# %% [code]
def profile_memory():
import tracemalloc
tracemalloc.start()
layer = Dense(1000, 1000)
x = np.random.randn(128, 1000)
output = layer.forward(x)
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
print("This shows why large models need GPUs!")
- Production Context
# %% [markdown]
"""
### In Production Systems
PyTorch's nn.Linear does the same thing but with:
- GPU acceleration via CUDA kernels
- Automatic differentiation support
- Optimized BLAS operations
- Memory pooling for efficiency
"""
📝 NBGrader Integration
Cell Metadata Structure
# %% [code] {"nbgrader": {"grade": false, "locked": false, "solution": true, "grade_id": "dense_implementation"}}
### BEGIN SOLUTION
class Dense:
# Full implementation for instructors
...
### END SOLUTION
### BEGIN HIDDEN TESTS
# Instructor-only tests
...
### END HIDDEN TESTS
Critical NBGrader Rules
- Every cell needs unique grade_id
- Scaffolding stays OUTSIDE solution blocks
- Hidden tests validate student work
- Points should reflect complexity
🎓 Educational Patterns
The "Build → Measure → Understand" Pattern
# 1. BUILD
class LayerNorm:
def forward(self, x):
mean = np.mean(x, axis=-1, keepdims=True)
var = np.var(x, axis=-1, keepdims=True)
return (x - mean) / np.sqrt(var + 1e-5)
# 2. MEASURE
def measure_performance():
layer = LayerNorm()
x = np.random.randn(1000, 512)
start = time.time()
for _ in range(100):
output = layer.forward(x)
elapsed = time.time() - start
print(f"Time per forward pass: {elapsed/100*1000:.2f}ms")
print(f"Throughput: {100*1000*512/elapsed:.0f} tokens/sec")
# 3. UNDERSTAND
"""
With 512 dimensions, normalization adds ~2ms overhead.
This is why large models use fused kernels!
"""
Progressive Complexity
Start simple, build up:
# Step 1: Simplest possible version
def relu_v1(x):
return np.maximum(0, x)
# Step 2: Add complexity
def relu_v2(x):
# Handle gradients
output = np.maximum(0, x)
output.grad_fn = lambda grad: grad * (x > 0)
return output
# Step 3: Production version
class ReLU:
def forward(self, x):
self.input = x # Save for backward
return np.maximum(0, x)
def backward(self, grad):
return grad * (self.input > 0)
⚠️ Common Pitfalls
-
Too Much Theory
- Students want to BUILD, not read
- Show through code, not exposition
-
Missing Systems Analysis
- Not just algorithms, but engineering
- Always discuss memory and performance
-
Tests at the End
- Loses educational flow
- Test immediately after implementation
-
No Production Context
- Students need to see real-world relevance
- Compare with PyTorch/TensorFlow
📌 Module Checklist
Before considering a module complete:
- All code in .py file (not notebook)
- Follows exact structure pattern
- Every implementation has immediate test
- Includes memory profiling
- Includes complexity analysis
- Shows production context
- NBGrader metadata correct
- ML systems thinking questions
- Summary is LAST section
- Tests run when module executed
🎯 Remember
We're teaching ML systems engineering, not just ML algorithms.
Every module should help students understand:
- How to BUILD ML systems
- Why performance matters
- Where bottlenecks occur
- How production systems work