mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-12 01:45:57 -05:00
MASSIVE DOCUMENTATION CLEANUP: - Reduced from 46 docs to 12 essential files - Archived 34 outdated planning and analysis documents ✅ KEPT (Essential for current operations): - STUDENT_QUICKSTART.md - Student onboarding - INSTRUCTOR_GUIDE.md - Instructor setup - cifar10-training-guide.md - North star achievement - tinytorch-assumptions.md - Complexity framework (NEW) - tinytorch-textbook-alignment.md - Academic alignment - NBGrader integration docs (3 files) - Development standards (3 files) - docs/README.md - Navigation guide (NEW) 🗑️ ARCHIVED (Completed/outdated planning): - All optimization-modules-* planning docs - All milestone-* system docs - All tutorial-master-plan and analysis docs - Module reordering and structure analysis - Agent setup and workflow case studies RESULT: Clean, focused documentation structure Only active, current docs remain - easy to find what you need!
8.4 KiB
8.4 KiB
TinyTorch Optimization Modules Tutorial Plan
Modules 15-20: From Manual Optimization to Automatic Systems
Overview: The Complete Optimization Journey
Students progress from manual optimization techniques to building intelligent systems that optimize automatically, culminating in a competition where their AutoML systems compete.
Manual Optimization (15-18) → Automatic Optimization (19) → Competition (20)
Module 15: Acceleration - Speed Optimization
Connection from Module 14
"Your transformer works but generates text slowly. Let's make it 10-100x faster!"
What Students Build
- Transform educational loops into optimized operations
- Cache-friendly blocked algorithms
- NumPy vectorization integration
- Transparent backend dispatch system
Key Learning Outcomes
- Understand why educational loops are slow (cache misses, no vectorization)
- Build blocked matrix multiplication for cache efficiency
- Learn when to use optimized libraries vs custom code
- Create backend systems for transparent optimization
Module Structure Change
- NEW: Show
OptimizedBackendclass upfront as the goal - Students see where they're heading before learning the steps
- "Here's the elegant solution, now let's understand how to build it"
Performance Impact: 10-100x speedup on matrix operations
Module 16: Memory - Memory Optimization
Connection from Module 15
"Operations are faster, but transformers still recompute everything. Let's be smarter with memory!"
What Students Build
KVCacheclass for transformer attention states- Incremental attention computation (process only new tokens)
- Memory profiling and analysis tools
- Cache management strategies
Key Learning Outcomes
- Memory vs computation tradeoffs
- Understanding O(N²) → O(N) optimization for sequences
- Production caching patterns (GPT, LLaMA)
- When caching helps vs hurts performance
Performance Impact: 50x speedup in autoregressive generation
Module 17: Quantization - Precision Optimization
Connection from Module 16
"Memory usage is optimized, but models are still huge. Let's use fewer bits!"
What Students Build
Quantizerclass for FP32→INT8 conversion- Calibration techniques for maintaining accuracy
- Quantized operations (matmul, conv2d)
- Model size analysis tools
Key Learning Outcomes
- Numerical precision vs accuracy tradeoffs
- Post-training quantization techniques
- Hardware acceleration through reduced precision
- When to use INT8 vs FP16 vs FP32
Performance Impact: 4x model size reduction, 2-4x inference speedup
Module 18: Compression - Structural Optimization
Connection from Module 17
"We're using fewer bits, but can we remove weights entirely?"
What Students Build
MagnitudePrunerfor weight removalStructuredPrunerfor channel/filter removal- Basic knowledge distillation
- Sparsity visualization tools
Key Learning Outcomes
- Structured vs unstructured pruning
- Magnitude-based pruning strategies
- Knowledge distillation basics
- Sparsity patterns and hardware efficiency
Performance Impact: 90% sparsity with <5% accuracy loss
Module 19: AutoTuning - Automatic Optimization
Connection from Module 18
"We have all these optimization techniques. Let's build systems that apply them automatically!"
What Students Build
class AutoTuner:
def auto_optimize(self, model, constraints):
"""
Automatically decide:
- Which optimizations to apply
- In what order
- With what parameters
- For what deployment target
"""
pass
def hyperparameter_search(self, model, data, budget):
"""Smart hyperparameter tuning (not random)"""
pass
def optimization_pipeline(self, model, target_hardware):
"""Build optimal pipeline for specific hardware"""
pass
def adaptive_training(self, model, data):
"""Training that adapts based on progress"""
pass
Key Learning Outcomes
- Automated optimization strategy selection
- Constraint-based optimization (memory, latency, accuracy)
- Hardware-aware optimization pipelines
- Smart search strategies (Bayesian optimization basics)
- Data-efficient training (curriculum learning, active learning)
Student Experience
"I built a system that takes any model and automatically optimizes it for any deployment target!"
Scope Balance (Not Too Complex)
- Focus on rule-based automation (if mobile → aggressive quantization)
- Simple grid search with smart pruning (not full Bayesian optimization)
- Basic hardware detection (CPU vs GPU vs Mobile)
- Pre-built optimization recipes that students can combine
Module 20: Competition - AutoML Olympics
Connection from Module 19
"You've built AutoTuning systems. Time to compete!"
What Students Build
- Complete end-to-end optimized ML systems
- Submission package for competition platform
- Performance analysis reports
- Innovation documentation
Competition Categories
- Speed Challenge: Fastest to reach target accuracy
- Size Challenge: Best accuracy under size constraints
- Efficiency Challenge: Best accuracy/resource tradeoff
- Innovation Challenge: Most creative optimization approach
Platform Concept
class CompetitionSubmission:
def __init__(self, team_name):
self.model = self.build_model()
self.auto_tuner = self.build_autotuner()
self.optimized = self.auto_tuner.optimize(self.model)
def evaluate(self, test_data):
"""Automated evaluation on hidden test set"""
return {
'accuracy': self.measure_accuracy(test_data),
'latency': self.measure_latency(),
'memory': self.measure_memory(),
'model_size': self.measure_size()
}
Leaderboard System
- Real-time rankings across multiple metrics
- Automated testing on standardized hardware
- Public showcase of techniques used
- Innovation bonus for novel approaches
Implementation Timeline
Week 1: Foundation
- Create placeholder directories for modules 16-20
- Restructure Module 15 with OptimizedBackend upfront
- Begin drafting Module 16 (Memory)
Week 2: Parallel Development
- Modules 16-18 developed in parallel by different agents
- PyTorch expert reviews all three simultaneously
- Integration testing between modules
Week 3: AutoTuning Development
- Module 19 development with appropriate scope
- Integration with all previous optimization modules
- Testing of automatic optimization pipelines
Week 4: Competition Platform
- Module 20 competition framework
- Leaderboard system design
- Submission and evaluation pipeline
Directory Structure
modules/
├── 15_acceleration/ [EXISTS - needs restructuring]
├── 16_memory/ [TO CREATE]
│ ├── memory_dev.py
│ ├── module.yaml
│ └── README.md
├── 17_quantization/ [TO CREATE]
│ ├── quantization_dev.py
│ ├── module.yaml
│ └── README.md
├── 18_compression/ [EXISTS - needs development]
│ ├── compression_dev.py
│ ├── module.yaml
│ └── README.md
├── 19_autotuning/ [TO CREATE]
│ ├── autotuning_dev.py
│ ├── module.yaml
│ └── README.md
└── 20_competition/ [TO CREATE]
├── competition_dev.py
├── module.yaml
└── README.md
Success Metrics
Educational Success
- Students understand when/why to apply each optimization
- Can build automated optimization systems
- Understand tradeoffs and constraints
- Ready for production ML engineering roles
Technical Success
- All optimizations integrate seamlessly
- AutoTuner successfully combines techniques
- Competition platform handles submissions
- Measurable performance improvements achieved
Engagement Success
- Students excited about optimization
- Active competition participation
- Innovative approaches developed
- Community sharing of techniques
Next Steps
- Get PyTorch expert validation on AutoTuning scope
- Create placeholder directories for new modules
- Begin parallel development of modules 16-18
- Design competition platform architecture
- Update master roadmap with final structure