mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-02 07:58:13 -05:00
docs: Major cleanup - 46 → 12 essential docs
MASSIVE DOCUMENTATION CLEANUP: - Reduced from 46 docs to 12 essential files - Archived 34 outdated planning and analysis documents ✅ KEPT (Essential for current operations): - STUDENT_QUICKSTART.md - Student onboarding - INSTRUCTOR_GUIDE.md - Instructor setup - cifar10-training-guide.md - North star achievement - tinytorch-assumptions.md - Complexity framework (NEW) - tinytorch-textbook-alignment.md - Academic alignment - NBGrader integration docs (3 files) - Development standards (3 files) - docs/README.md - Navigation guide (NEW) 🗑️ ARCHIVED (Completed/outdated planning): - All optimization-modules-* planning docs - All milestone-* system docs - All tutorial-master-plan and analysis docs - Module reordering and structure analysis - Agent setup and workflow case studies RESULT: Clean, focused documentation structure Only active, current docs remain - easy to find what you need!
This commit is contained in:
276
docs/archive/2024-cleanup/optimization-modules-tutorial-plan.md
Normal file
276
docs/archive/2024-cleanup/optimization-modules-tutorial-plan.md
Normal file
@@ -0,0 +1,276 @@
|
||||
# TinyTorch Optimization Modules Tutorial Plan
|
||||
## Modules 15-20: From Manual Optimization to Automatic Systems
|
||||
|
||||
## Overview: The Complete Optimization Journey
|
||||
|
||||
Students progress from manual optimization techniques to building intelligent systems that optimize automatically, culminating in a competition where their AutoML systems compete.
|
||||
|
||||
```
|
||||
Manual Optimization (15-18) → Automatic Optimization (19) → Competition (20)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Module 15: Acceleration - Speed Optimization
|
||||
|
||||
### **Connection from Module 14**
|
||||
"Your transformer works but generates text slowly. Let's make it 10-100x faster!"
|
||||
|
||||
### **What Students Build**
|
||||
- Transform educational loops into optimized operations
|
||||
- Cache-friendly blocked algorithms
|
||||
- NumPy vectorization integration
|
||||
- Transparent backend dispatch system
|
||||
|
||||
### **Key Learning Outcomes**
|
||||
- Understand why educational loops are slow (cache misses, no vectorization)
|
||||
- Build blocked matrix multiplication for cache efficiency
|
||||
- Learn when to use optimized libraries vs custom code
|
||||
- Create backend systems for transparent optimization
|
||||
|
||||
### **Module Structure Change**
|
||||
- **NEW**: Show `OptimizedBackend` class upfront as the goal
|
||||
- Students see where they're heading before learning the steps
|
||||
- "Here's the elegant solution, now let's understand how to build it"
|
||||
|
||||
### **Performance Impact**: 10-100x speedup on matrix operations
|
||||
|
||||
---
|
||||
|
||||
## Module 16: Memory - Memory Optimization
|
||||
|
||||
### **Connection from Module 15**
|
||||
"Operations are faster, but transformers still recompute everything. Let's be smarter with memory!"
|
||||
|
||||
### **What Students Build**
|
||||
- `KVCache` class for transformer attention states
|
||||
- Incremental attention computation (process only new tokens)
|
||||
- Memory profiling and analysis tools
|
||||
- Cache management strategies
|
||||
|
||||
### **Key Learning Outcomes**
|
||||
- Memory vs computation tradeoffs
|
||||
- Understanding O(N²) → O(N) optimization for sequences
|
||||
- Production caching patterns (GPT, LLaMA)
|
||||
- When caching helps vs hurts performance
|
||||
|
||||
### **Performance Impact**: 50x speedup in autoregressive generation
|
||||
|
||||
---
|
||||
|
||||
## Module 17: Quantization - Precision Optimization
|
||||
|
||||
### **Connection from Module 16**
|
||||
"Memory usage is optimized, but models are still huge. Let's use fewer bits!"
|
||||
|
||||
### **What Students Build**
|
||||
- `Quantizer` class for FP32→INT8 conversion
|
||||
- Calibration techniques for maintaining accuracy
|
||||
- Quantized operations (matmul, conv2d)
|
||||
- Model size analysis tools
|
||||
|
||||
### **Key Learning Outcomes**
|
||||
- Numerical precision vs accuracy tradeoffs
|
||||
- Post-training quantization techniques
|
||||
- Hardware acceleration through reduced precision
|
||||
- When to use INT8 vs FP16 vs FP32
|
||||
|
||||
### **Performance Impact**: 4x model size reduction, 2-4x inference speedup
|
||||
|
||||
---
|
||||
|
||||
## Module 18: Compression - Structural Optimization
|
||||
|
||||
### **Connection from Module 17**
|
||||
"We're using fewer bits, but can we remove weights entirely?"
|
||||
|
||||
### **What Students Build**
|
||||
- `MagnitudePruner` for weight removal
|
||||
- `StructuredPruner` for channel/filter removal
|
||||
- Basic knowledge distillation
|
||||
- Sparsity visualization tools
|
||||
|
||||
### **Key Learning Outcomes**
|
||||
- Structured vs unstructured pruning
|
||||
- Magnitude-based pruning strategies
|
||||
- Knowledge distillation basics
|
||||
- Sparsity patterns and hardware efficiency
|
||||
|
||||
### **Performance Impact**: 90% sparsity with <5% accuracy loss
|
||||
|
||||
---
|
||||
|
||||
## Module 19: AutoTuning - Automatic Optimization
|
||||
|
||||
### **Connection from Module 18**
|
||||
"We have all these optimization techniques. Let's build systems that apply them automatically!"
|
||||
|
||||
### **What Students Build**
|
||||
```python
|
||||
class AutoTuner:
|
||||
def auto_optimize(self, model, constraints):
|
||||
"""
|
||||
Automatically decide:
|
||||
- Which optimizations to apply
|
||||
- In what order
|
||||
- With what parameters
|
||||
- For what deployment target
|
||||
"""
|
||||
pass
|
||||
|
||||
def hyperparameter_search(self, model, data, budget):
|
||||
"""Smart hyperparameter tuning (not random)"""
|
||||
pass
|
||||
|
||||
def optimization_pipeline(self, model, target_hardware):
|
||||
"""Build optimal pipeline for specific hardware"""
|
||||
pass
|
||||
|
||||
def adaptive_training(self, model, data):
|
||||
"""Training that adapts based on progress"""
|
||||
pass
|
||||
```
|
||||
|
||||
### **Key Learning Outcomes**
|
||||
- Automated optimization strategy selection
|
||||
- Constraint-based optimization (memory, latency, accuracy)
|
||||
- Hardware-aware optimization pipelines
|
||||
- Smart search strategies (Bayesian optimization basics)
|
||||
- Data-efficient training (curriculum learning, active learning)
|
||||
|
||||
### **Student Experience**
|
||||
"I built a system that takes any model and automatically optimizes it for any deployment target!"
|
||||
|
||||
### **Scope Balance** (Not Too Complex)
|
||||
- Focus on **rule-based automation** (if mobile → aggressive quantization)
|
||||
- Simple **grid search** with smart pruning (not full Bayesian optimization)
|
||||
- Basic **hardware detection** (CPU vs GPU vs Mobile)
|
||||
- **Pre-built optimization recipes** that students can combine
|
||||
|
||||
---
|
||||
|
||||
## Module 20: Competition - AutoML Olympics
|
||||
|
||||
### **Connection from Module 19**
|
||||
"You've built AutoTuning systems. Time to compete!"
|
||||
|
||||
### **What Students Build**
|
||||
- Complete end-to-end optimized ML systems
|
||||
- Submission package for competition platform
|
||||
- Performance analysis reports
|
||||
- Innovation documentation
|
||||
|
||||
### **Competition Categories**
|
||||
1. **Speed Challenge**: Fastest to reach target accuracy
|
||||
2. **Size Challenge**: Best accuracy under size constraints
|
||||
3. **Efficiency Challenge**: Best accuracy/resource tradeoff
|
||||
4. **Innovation Challenge**: Most creative optimization approach
|
||||
|
||||
### **Platform Concept**
|
||||
```python
|
||||
class CompetitionSubmission:
|
||||
def __init__(self, team_name):
|
||||
self.model = self.build_model()
|
||||
self.auto_tuner = self.build_autotuner()
|
||||
self.optimized = self.auto_tuner.optimize(self.model)
|
||||
|
||||
def evaluate(self, test_data):
|
||||
"""Automated evaluation on hidden test set"""
|
||||
return {
|
||||
'accuracy': self.measure_accuracy(test_data),
|
||||
'latency': self.measure_latency(),
|
||||
'memory': self.measure_memory(),
|
||||
'model_size': self.measure_size()
|
||||
}
|
||||
```
|
||||
|
||||
### **Leaderboard System**
|
||||
- Real-time rankings across multiple metrics
|
||||
- Automated testing on standardized hardware
|
||||
- Public showcase of techniques used
|
||||
- Innovation bonus for novel approaches
|
||||
|
||||
---
|
||||
|
||||
## Implementation Timeline
|
||||
|
||||
### **Week 1: Foundation**
|
||||
- Create placeholder directories for modules 16-20
|
||||
- Restructure Module 15 with OptimizedBackend upfront
|
||||
- Begin drafting Module 16 (Memory)
|
||||
|
||||
### **Week 2: Parallel Development**
|
||||
- Modules 16-18 developed in parallel by different agents
|
||||
- PyTorch expert reviews all three simultaneously
|
||||
- Integration testing between modules
|
||||
|
||||
### **Week 3: AutoTuning Development**
|
||||
- Module 19 development with appropriate scope
|
||||
- Integration with all previous optimization modules
|
||||
- Testing of automatic optimization pipelines
|
||||
|
||||
### **Week 4: Competition Platform**
|
||||
- Module 20 competition framework
|
||||
- Leaderboard system design
|
||||
- Submission and evaluation pipeline
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
modules/
|
||||
├── 15_acceleration/ [EXISTS - needs restructuring]
|
||||
├── 16_memory/ [TO CREATE]
|
||||
│ ├── memory_dev.py
|
||||
│ ├── module.yaml
|
||||
│ └── README.md
|
||||
├── 17_quantization/ [TO CREATE]
|
||||
│ ├── quantization_dev.py
|
||||
│ ├── module.yaml
|
||||
│ └── README.md
|
||||
├── 18_compression/ [EXISTS - needs development]
|
||||
│ ├── compression_dev.py
|
||||
│ ├── module.yaml
|
||||
│ └── README.md
|
||||
├── 19_autotuning/ [TO CREATE]
|
||||
│ ├── autotuning_dev.py
|
||||
│ ├── module.yaml
|
||||
│ └── README.md
|
||||
└── 20_competition/ [TO CREATE]
|
||||
├── competition_dev.py
|
||||
├── module.yaml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### **Educational Success**
|
||||
- Students understand when/why to apply each optimization
|
||||
- Can build automated optimization systems
|
||||
- Understand tradeoffs and constraints
|
||||
- Ready for production ML engineering roles
|
||||
|
||||
### **Technical Success**
|
||||
- All optimizations integrate seamlessly
|
||||
- AutoTuner successfully combines techniques
|
||||
- Competition platform handles submissions
|
||||
- Measurable performance improvements achieved
|
||||
|
||||
### **Engagement Success**
|
||||
- Students excited about optimization
|
||||
- Active competition participation
|
||||
- Innovative approaches developed
|
||||
- Community sharing of techniques
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Get PyTorch expert validation** on AutoTuning scope
|
||||
2. **Create placeholder directories** for new modules
|
||||
3. **Begin parallel development** of modules 16-18
|
||||
4. **Design competition platform** architecture
|
||||
5. **Update master roadmap** with final structure
|
||||
Reference in New Issue
Block a user