mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 01:17:32 -05:00
🎯 NORTH STAR VISION DOCUMENTED: 'Don't Just Import It, Build It' - Training AI Engineers, not just ML users AI Engineering emerges as a foundational discipline like Computer Engineering, bridging algorithms and systems to build the AI infrastructure of the future. 🧪 ROBUST TESTING FRAMEWORK ESTABLISHED: - Created tests/regression/ for sandbox integrity tests - Implemented test-driven bug prevention workflow - Clear separation: student tests (pedagogical) vs system tests (robustness) - Every bug becomes a test to prevent recurrence ✅ KEY IMPLEMENTATIONS: - NORTH_STAR.md: Vision for AI Engineering discipline - Testing best practices: Focus on robust student sandbox - Git workflow standards: Professional development practices - Regression test suite: Prevent infrastructure issues - Conv->Linear dimension tests (found CNN bug) - Transformer reshaping tests (found GPT bug) 🏗️ SANDBOX INTEGRITY: Students need a solid, predictable environment where they focus on ML concepts, not debugging framework issues. The framework must be invisible. 📚 EDUCATIONAL PHILOSOPHY: TinyTorch isn't just teaching a framework - it's founding the AI Engineering discipline by training engineers who understand how to BUILD ML systems. This establishes the foundation for training the first generation of true AI Engineers who will define this emerging discipline.
7.1 KiB
7.1 KiB
TinyTorch Optimization Module Architecture
PyTorch Expert Review and Design Recommendations
Current Architecture Analysis
Strengths:
- Clean module progression (tensor → layers → networks → training)
- Solid pedagogical foundation with NBGrader integration
- Export system preserves student learning journey
- Real systems focus with memory profiling
Challenge: Need to add competition-ready optimizations without breaking existing learning progression or export system.
Recommended Architecture: Backend Dispatch System
1. Backend Interface Design
# New: tinytorch/backends/__init__.py
from abc import ABC, abstractmethod
class ComputeBackend(ABC):
"""Abstract base class for computational backends"""
@abstractmethod
def matmul(self, a: np.ndarray, b: np.ndarray) -> np.ndarray:
"""Matrix multiplication implementation"""
pass
@abstractmethod
def conv2d(self, input: np.ndarray, kernel: np.ndarray,
stride: int = 1, padding: int = 0) -> np.ndarray:
"""2D convolution implementation"""
pass
class NaiveBackend(ComputeBackend):
"""Pedagogical reference implementation"""
def matmul(self, a: np.ndarray, b: np.ndarray) -> np.ndarray:
# Triple-loop O(n³) implementation for learning
m, k = a.shape
k2, n = b.shape
assert k == k2
result = np.zeros((m, n))
for i in range(m):
for j in range(n):
for l in range(k):
result[i, j] += a[i, l] * b[l, j]
return result
def conv2d(self, input, kernel, stride=1, padding=0):
# Naive sliding window implementation
return naive_conv2d(input, kernel, stride, padding)
class OptimizedBackend(ComputeBackend):
"""Competition-ready optimized implementation"""
def matmul(self, a: np.ndarray, b: np.ndarray) -> np.ndarray:
# Cache-friendly blocked matrix multiplication
return optimized_blocked_matmul(a, b)
def conv2d(self, input, kernel, stride=1, padding=0):
# im2col + GEMM optimization
return optimized_conv2d(input, kernel, stride, padding)
2. Configuration System
# New: tinytorch/config.py
_backend = None
def set_backend(backend_name: str):
"""Switch computational backend globally"""
global _backend
if backend_name == 'naive':
_backend = NaiveBackend()
elif backend_name == 'optimized':
_backend = OptimizedBackend()
else:
raise ValueError(f"Unknown backend: {backend_name}")
def get_backend() -> ComputeBackend:
"""Get current backend, defaulting to naive"""
global _backend
if _backend is None:
_backend = NaiveBackend() # Default to learning mode
return _backend
3. Existing API Modifications (Minimal Changes)
# Modified: tinytorch/core/layers.py (line ~112)
def matmul(a: Tensor, b: Tensor) -> Tensor:
"""Matrix multiplication with backend dispatch"""
from tinytorch.config import get_backend
backend = get_backend()
result_data = backend.matmul(a.data, b.data)
return Tensor(result_data)
# The Dense layer automatically gets the optimization!
# No changes needed to Dense.forward() method
Module Progression Strategy
Modules 1-10: Pure Learning Mode
- Always use
NaiveBackend(hardcoded) - Focus on understanding algorithms
- No mention of optimization
Module 11-12: Introduce Backend Concept
- Explain why optimizations matter
- Show backend switching API
- Compare naive vs optimized performance
Module 13: Performance Kernels (NEW)
- Implement optimized backends
- Cache-friendly algorithms
- Memory access pattern optimization
- SIMD/vectorization techniques
Module 14: Benchmarking & Competition (MODIFIED)
- Comprehensive performance measurement
- Memory profiling tools
- Competition leaderboard system
- Head-to-head performance comparisons
Competition Framework Design
Benchmark Context Manager
# New: tinytorch/benchmark.py
import time
import tracemalloc
from contextlib import contextmanager
@contextmanager
def benchmark():
"""Context manager for performance measurement"""
tracemalloc.start()
start_time = time.perf_counter()
try:
yield BenchmarkResult()
finally:
end_time = time.perf_counter()
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
# Store results in returned object
result.time_ms = (end_time - start_time) * 1000
result.peak_memory_mb = peak / 1024 / 1024
result.current_memory_mb = current / 1024 / 1024
class BenchmarkResult:
def __init__(self):
self.time_ms = 0
self.peak_memory_mb = 0
self.current_memory_mb = 0
Competition API
# Student competition usage
import tinytorch
# Learning phase
tinytorch.set_backend('naive')
with tinytorch.benchmark() as bench:
output = model(input)
print(f"Naive: {bench.time_ms:.1f}ms, {bench.peak_memory_mb:.1f}MB")
# Competition phase
tinytorch.set_backend('optimized')
with tinytorch.benchmark() as bench:
output = model(input)
print(f"Optimized: {bench.time_ms:.1f}ms, {bench.peak_memory_mb:.1f}MB")
# Speedup calculation
speedup = naive_time / optimized_time
print(f"Speedup: {speedup:.1f}x faster!")
Implementation Benefits
1. Zero Breaking Changes
- Existing student code works unchanged
- Export system remains intact
- Learning progression preserved
2. Easy Competition Setup
# Same model, same data, dramatic performance difference
model = build_resnet()
data = load_cifar10()
# Students compete on who can optimize best
tinytorch.set_backend('student_submission_1')
tinytorch.set_backend('student_submission_2')
3. Realistic Performance Differences
- Naive matmul: O(n³) with poor cache behavior
- Optimized matmul: Blocked + SIMD → 10-100x speedup
- Students see why optimization matters!
4. Clean Separation of Concerns
- Modules 1-10: Pure learning (algorithms)
- Modules 11-14: Systems engineering (optimization)
- Competition: Best of both worlds
PyTorch Design Lessons Applied
This architecture mirrors how PyTorch actually works:
- Dispatcher Pattern: PyTorch uses dispatching to different backends (CPU/CUDA/XLA)
- Operator Fusion: High-level operations dispatch to optimized kernels
- Backward Compatibility: Old code works unchanged when optimizations are added
- Performance Isolation: Learning code doesn't need to know about optimizations
Next Steps Recommendation
- Start small: Implement backend system for just
matmulfirst - Prove the pattern: Show 10x+ speedup possible with same API
- Expand gradually: Add conv2d, attention, etc.
- Build competition tools: Leaderboards, automated benchmarking
- Create optimization modules: Let students implement their own backends
This architecture gives you the best of both worlds: clean learning progression AND competition-ready performance, using the same patterns that make PyTorch successful in production.