mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 05:17:37 -05:00
MASSIVE DOCUMENTATION CLEANUP: - Reduced from 46 docs to 12 essential files - Archived 34 outdated planning and analysis documents ✅ KEPT (Essential for current operations): - STUDENT_QUICKSTART.md - Student onboarding - INSTRUCTOR_GUIDE.md - Instructor setup - cifar10-training-guide.md - North star achievement - tinytorch-assumptions.md - Complexity framework (NEW) - tinytorch-textbook-alignment.md - Academic alignment - NBGrader integration docs (3 files) - Development standards (3 files) - docs/README.md - Navigation guide (NEW) 🗑️ ARCHIVED (Completed/outdated planning): - All optimization-modules-* planning docs - All milestone-* system docs - All tutorial-master-plan and analysis docs - Module reordering and structure analysis - Agent setup and workflow case studies RESULT: Clean, focused documentation structure Only active, current docs remain - easy to find what you need!
276 lines
8.4 KiB
Markdown
276 lines
8.4 KiB
Markdown
# TinyTorch Optimization Modules Tutorial Plan
|
|
## Modules 15-20: From Manual Optimization to Automatic Systems
|
|
|
|
## Overview: The Complete Optimization Journey
|
|
|
|
Students progress from manual optimization techniques to building intelligent systems that optimize automatically, culminating in a competition where their AutoML systems compete.
|
|
|
|
```
|
|
Manual Optimization (15-18) → Automatic Optimization (19) → Competition (20)
|
|
```
|
|
|
|
---
|
|
|
|
## Module 15: Acceleration - Speed Optimization
|
|
|
|
### **Connection from Module 14**
|
|
"Your transformer works but generates text slowly. Let's make it 10-100x faster!"
|
|
|
|
### **What Students Build**
|
|
- Transform educational loops into optimized operations
|
|
- Cache-friendly blocked algorithms
|
|
- NumPy vectorization integration
|
|
- Transparent backend dispatch system
|
|
|
|
### **Key Learning Outcomes**
|
|
- Understand why educational loops are slow (cache misses, no vectorization)
|
|
- Build blocked matrix multiplication for cache efficiency
|
|
- Learn when to use optimized libraries vs custom code
|
|
- Create backend systems for transparent optimization
|
|
|
|
### **Module Structure Change**
|
|
- **NEW**: Show `OptimizedBackend` class upfront as the goal
|
|
- Students see where they're heading before learning the steps
|
|
- "Here's the elegant solution, now let's understand how to build it"
|
|
|
|
### **Performance Impact**: 10-100x speedup on matrix operations
|
|
|
|
---
|
|
|
|
## Module 16: Memory - Memory Optimization
|
|
|
|
### **Connection from Module 15**
|
|
"Operations are faster, but transformers still recompute everything. Let's be smarter with memory!"
|
|
|
|
### **What Students Build**
|
|
- `KVCache` class for transformer attention states
|
|
- Incremental attention computation (process only new tokens)
|
|
- Memory profiling and analysis tools
|
|
- Cache management strategies
|
|
|
|
### **Key Learning Outcomes**
|
|
- Memory vs computation tradeoffs
|
|
- Understanding O(N²) → O(N) optimization for sequences
|
|
- Production caching patterns (GPT, LLaMA)
|
|
- When caching helps vs hurts performance
|
|
|
|
### **Performance Impact**: 50x speedup in autoregressive generation
|
|
|
|
---
|
|
|
|
## Module 17: Quantization - Precision Optimization
|
|
|
|
### **Connection from Module 16**
|
|
"Memory usage is optimized, but models are still huge. Let's use fewer bits!"
|
|
|
|
### **What Students Build**
|
|
- `Quantizer` class for FP32→INT8 conversion
|
|
- Calibration techniques for maintaining accuracy
|
|
- Quantized operations (matmul, conv2d)
|
|
- Model size analysis tools
|
|
|
|
### **Key Learning Outcomes**
|
|
- Numerical precision vs accuracy tradeoffs
|
|
- Post-training quantization techniques
|
|
- Hardware acceleration through reduced precision
|
|
- When to use INT8 vs FP16 vs FP32
|
|
|
|
### **Performance Impact**: 4x model size reduction, 2-4x inference speedup
|
|
|
|
---
|
|
|
|
## Module 18: Compression - Structural Optimization
|
|
|
|
### **Connection from Module 17**
|
|
"We're using fewer bits, but can we remove weights entirely?"
|
|
|
|
### **What Students Build**
|
|
- `MagnitudePruner` for weight removal
|
|
- `StructuredPruner` for channel/filter removal
|
|
- Basic knowledge distillation
|
|
- Sparsity visualization tools
|
|
|
|
### **Key Learning Outcomes**
|
|
- Structured vs unstructured pruning
|
|
- Magnitude-based pruning strategies
|
|
- Knowledge distillation basics
|
|
- Sparsity patterns and hardware efficiency
|
|
|
|
### **Performance Impact**: 90% sparsity with <5% accuracy loss
|
|
|
|
---
|
|
|
|
## Module 19: AutoTuning - Automatic Optimization
|
|
|
|
### **Connection from Module 18**
|
|
"We have all these optimization techniques. Let's build systems that apply them automatically!"
|
|
|
|
### **What Students Build**
|
|
```python
|
|
class AutoTuner:
|
|
def auto_optimize(self, model, constraints):
|
|
"""
|
|
Automatically decide:
|
|
- Which optimizations to apply
|
|
- In what order
|
|
- With what parameters
|
|
- For what deployment target
|
|
"""
|
|
pass
|
|
|
|
def hyperparameter_search(self, model, data, budget):
|
|
"""Smart hyperparameter tuning (not random)"""
|
|
pass
|
|
|
|
def optimization_pipeline(self, model, target_hardware):
|
|
"""Build optimal pipeline for specific hardware"""
|
|
pass
|
|
|
|
def adaptive_training(self, model, data):
|
|
"""Training that adapts based on progress"""
|
|
pass
|
|
```
|
|
|
|
### **Key Learning Outcomes**
|
|
- Automated optimization strategy selection
|
|
- Constraint-based optimization (memory, latency, accuracy)
|
|
- Hardware-aware optimization pipelines
|
|
- Smart search strategies (Bayesian optimization basics)
|
|
- Data-efficient training (curriculum learning, active learning)
|
|
|
|
### **Student Experience**
|
|
"I built a system that takes any model and automatically optimizes it for any deployment target!"
|
|
|
|
### **Scope Balance** (Not Too Complex)
|
|
- Focus on **rule-based automation** (if mobile → aggressive quantization)
|
|
- Simple **grid search** with smart pruning (not full Bayesian optimization)
|
|
- Basic **hardware detection** (CPU vs GPU vs Mobile)
|
|
- **Pre-built optimization recipes** that students can combine
|
|
|
|
---
|
|
|
|
## Module 20: Competition - AutoML Olympics
|
|
|
|
### **Connection from Module 19**
|
|
"You've built AutoTuning systems. Time to compete!"
|
|
|
|
### **What Students Build**
|
|
- Complete end-to-end optimized ML systems
|
|
- Submission package for competition platform
|
|
- Performance analysis reports
|
|
- Innovation documentation
|
|
|
|
### **Competition Categories**
|
|
1. **Speed Challenge**: Fastest to reach target accuracy
|
|
2. **Size Challenge**: Best accuracy under size constraints
|
|
3. **Efficiency Challenge**: Best accuracy/resource tradeoff
|
|
4. **Innovation Challenge**: Most creative optimization approach
|
|
|
|
### **Platform Concept**
|
|
```python
|
|
class CompetitionSubmission:
|
|
def __init__(self, team_name):
|
|
self.model = self.build_model()
|
|
self.auto_tuner = self.build_autotuner()
|
|
self.optimized = self.auto_tuner.optimize(self.model)
|
|
|
|
def evaluate(self, test_data):
|
|
"""Automated evaluation on hidden test set"""
|
|
return {
|
|
'accuracy': self.measure_accuracy(test_data),
|
|
'latency': self.measure_latency(),
|
|
'memory': self.measure_memory(),
|
|
'model_size': self.measure_size()
|
|
}
|
|
```
|
|
|
|
### **Leaderboard System**
|
|
- Real-time rankings across multiple metrics
|
|
- Automated testing on standardized hardware
|
|
- Public showcase of techniques used
|
|
- Innovation bonus for novel approaches
|
|
|
|
---
|
|
|
|
## Implementation Timeline
|
|
|
|
### **Week 1: Foundation**
|
|
- Create placeholder directories for modules 16-20
|
|
- Restructure Module 15 with OptimizedBackend upfront
|
|
- Begin drafting Module 16 (Memory)
|
|
|
|
### **Week 2: Parallel Development**
|
|
- Modules 16-18 developed in parallel by different agents
|
|
- PyTorch expert reviews all three simultaneously
|
|
- Integration testing between modules
|
|
|
|
### **Week 3: AutoTuning Development**
|
|
- Module 19 development with appropriate scope
|
|
- Integration with all previous optimization modules
|
|
- Testing of automatic optimization pipelines
|
|
|
|
### **Week 4: Competition Platform**
|
|
- Module 20 competition framework
|
|
- Leaderboard system design
|
|
- Submission and evaluation pipeline
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
modules/
|
|
├── 15_acceleration/ [EXISTS - needs restructuring]
|
|
├── 16_memory/ [TO CREATE]
|
|
│ ├── memory_dev.py
|
|
│ ├── module.yaml
|
|
│ └── README.md
|
|
├── 17_quantization/ [TO CREATE]
|
|
│ ├── quantization_dev.py
|
|
│ ├── module.yaml
|
|
│ └── README.md
|
|
├── 18_compression/ [EXISTS - needs development]
|
|
│ ├── compression_dev.py
|
|
│ ├── module.yaml
|
|
│ └── README.md
|
|
├── 19_autotuning/ [TO CREATE]
|
|
│ ├── autotuning_dev.py
|
|
│ ├── module.yaml
|
|
│ └── README.md
|
|
└── 20_competition/ [TO CREATE]
|
|
├── competition_dev.py
|
|
├── module.yaml
|
|
└── README.md
|
|
```
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
### **Educational Success**
|
|
- Students understand when/why to apply each optimization
|
|
- Can build automated optimization systems
|
|
- Understand tradeoffs and constraints
|
|
- Ready for production ML engineering roles
|
|
|
|
### **Technical Success**
|
|
- All optimizations integrate seamlessly
|
|
- AutoTuner successfully combines techniques
|
|
- Competition platform handles submissions
|
|
- Measurable performance improvements achieved
|
|
|
|
### **Engagement Success**
|
|
- Students excited about optimization
|
|
- Active competition participation
|
|
- Innovative approaches developed
|
|
- Community sharing of techniques
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Get PyTorch expert validation** on AutoTuning scope
|
|
2. **Create placeholder directories** for new modules
|
|
3. **Begin parallel development** of modules 16-18
|
|
4. **Design competition platform** architecture
|
|
5. **Update master roadmap** with final structure |