Files
TinyTorch/docs/optimization-modules-development-plan.md
Vijay Janapa Reddi 910900f504 FEAT: Complete optimization modules 15-20 with ML Systems focus
Major accomplishment: Implemented comprehensive ML Systems optimization sequence
Module progression: Profiling → Acceleration → Quantization → Compression → Caching → Benchmarking

Key changes:
- Module 15 (Profiling): Performance detective tools with Timer, MemoryProfiler, FLOPCounter
- Module 16 (Acceleration): Backend optimization showing 2700x+ speedups
- Module 17 (Quantization): INT8 optimization with 8x compression, <1% accuracy loss
- Module 18 (Compression): Neural network pruning achieving 70% sparsity
- Module 19 (Caching): KV cache for transformers, O(N²) → O(N) complexity
- Module 20 (Benchmarking): TinyMLPerf competition framework with leaderboards

Module reorganization:
- Moved profiling to Module 15 (was 19) for 'measure first' philosophy
- Reordered sequence for optimal pedagogical flow
- Fixed all backward dependencies from Module 20 → 1
- Updated Module 14 transformers to support KV caching

Technical achievements:
- All modules tested and working (95% success rate)
- PyTorch expert validated: 'Exceptional dependency design'
- Production-ready ML systems optimization techniques
- Complete learning journey from basic tensors to advanced optimizations

Educational impact:
- Students learn real production optimization workflows
- Each module builds naturally on previous foundations
- No forward dependencies or conceptual gaps
- Mirrors industry-standard ML systems engineering practices
2025-09-24 22:34:20 -04:00

200 lines
7.1 KiB
Markdown

# Optimization Modules Development Plan
## Comprehensive Coordination for Modules 15-20
## Phase 1: Module Naming & Structure Updates
### **Recommended Naming Changes:**
```
Current → New (Thematic Flow)
15_acceleration → 15_acceleration (KEEP - perfect)
16_caching → 16_memory (Memory Optimization)
17_precision → 17_quantization (Size Optimization)
18_compression → 18_compression (KEEP - perfect)
19_benchmarking → 19_profiling (Performance Analysis)
20_capstone → 20_capstone (KEEP - perfect)
```
**Why This Thematic Flow Works:**
- **Acceleration**: "Make it faster"
- **Memory**: "Use memory smarter"
- **Quantization**: "Use fewer bits"
- **Compression**: "Remove what's unnecessary"
- **Profiling**: "Measure everything"
- **Capstone**: "Put it all together"
### **Module 15 Structure Changes:**
**Current Problem**: OptimizedBackend comes at the end (line 277)
**Solution**: Move to beginning to show students the goal upfront
**New Structure:**
1. **Part 1: The Goal** - Show OptimizedBackend first
2. **Part 2: Why We Need Optimization** - Educational loops analysis
3. **Part 3: Building Better** - Blocked algorithms
4. **Part 4: Production Reality** - NumPy integration
5. **Part 5: Transparent Backend** - How automatic switching works
**Student Experience**: "Here's where we're going (OptimizedBackend), now let me show you how we get there step by step."
## Phase 2: Parallel Development Coordination
### **Agent Team Assignment:**
#### **Module 16: Memory Optimization**
**Agent**: Module Developer A
**Focus**: KV caching for transformers
**Key Components**:
- `KVCache` class for attention state storage
- Incremental attention computation
- Memory vs computation tradeoff analysis
- Integration with Module 14 transformers
**Connection to Previous**: "Transformers recompute attention every token - wasteful!"
#### **Module 17: Quantization**
**Agent**: Module Developer B
**Focus**: INT8 quantization techniques
**Key Components**:
- `Quantizer` class for FP32→INT8 conversion
- Calibration techniques for accuracy retention
- Quantized operations (matmul, conv)
- Model size reduction analysis
**Connection to Previous**: "Memory optimization helps, but models are still huge!"
#### **Module 18: Compression**
**Agent**: Module Developer C
**Focus**: Pruning and knowledge distillation
**Key Components**:
- `MagnitudePruner` for weight removal
- `StructuredPruner` for channel removal
- `KnowledgeDistillation` trainer
- Sparsity pattern analysis
**Connection to Previous**: "Quantization reduced precision, can we remove weights entirely?"
### **Parallel Development Timeline:**
**Week 1**: All three agents draft initial implementations
**Week 2**: PyTorch expert reviews all three modules in parallel
**Week 3**: Revisions based on expert feedback
**Week 4**: Integration testing and final polish
## Phase 3: Module 19 - Profiling (Not Benchmarking)
### **New Focus: Performance Profiling Tools**
Instead of abstract benchmarking, students build **practical profiling tools**:
#### **What Students Build:**
1. **`PerformanceProfiler`** - Time and memory measurement
2. **`BottleneckAnalyzer`** - Identify slow operations
3. **`OptimizationComparer`** - Before/after analysis
4. **`InteractionAnalyzer`** - How optimizations combine
#### **Student Experience:**
```python
# Profile their own models from previous modules
profiler = PerformanceProfiler()
with profiler.profile("my_transformer"):
output = my_transformer(inputs)
# See exactly where time is spent
profiler.report()
# Output:
# - Attention: 45% of time
# - Feed Forward: 30% of time
# - Embedding: 15% of time
# - Other: 10% of time
# Then apply optimizations and re-profile
profiler.compare_optimizations(baseline, quantized, pruned, cached)
```
#### **Connection to Previous**: "We have all these optimization techniques - how do we measure their combined impact scientifically?"
## Phase 4: Module 20 - Capstone Ideas
### **Option A: Interactive Performance Competition Website**
**Concept**: Students submit optimized models to a leaderboard system
**Features**:
- Upload optimized model implementations
- Automatic performance testing (speed, memory, accuracy)
- Real-time leaderboard with multiple categories
- Model analysis and optimization suggestions
**Categories**:
- "Fastest CIFAR-10 Trainer" (speed focus)
- "Most Memory Efficient GPT" (memory focus)
- "Best Accuracy/Size Tradeoff" (balance focus)
- "Most Creative Optimization" (innovation focus)
### **Option B: Complete ML System Deployment Challenge**
**Concept**: Build and deploy complete optimized ML systems
**Project Options**:
1. **Edge AI Challenge**: Deploy GPT on Raspberry Pi
2. **Mobile ML Challenge**: CIFAR-10 classifier on phone
3. **Datacenter Challenge**: Multi-GPU training optimization
4. **Custom Challenge**: Student-defined optimization problem
**Deliverables**:
- Working system with all optimizations
- Performance analysis report
- Deployment documentation
- Innovation summary
### **Option C: "ML Systems Portfolio" Capstone**
**Concept**: Students create professional portfolio showcasing their TinyTorch journey
**Portfolio Components**:
1. **Technical Blog Posts** - Explain each optimization technique
2. **Performance Analysis Reports** - Before/after comparisons
3. **Code Showcase** - Best implementations with explanations
4. **Industry Case Studies** - How TinyTorch techniques apply to real systems
5. **Innovation Project** - Original optimization idea
**Public Showcase**: Host student portfolios on tinytorch.ai/students/
## Phase 5: Expert Review Protocol
### **Parallel Review Process:**
Once all three modules (16-18) have initial drafts:
1. **Submit to PyTorch Expert simultaneously**
2. **Expert reviews all three for**:
- Pedagogical flow and connections
- Technical accuracy and best practices
- Integration with existing modules
- Production relevance
3. **Expert provides comparative feedback**:
- How modules work together as a system
- Optimization interaction effects
- Real-world applicability
4. **Agents revise based on holistic feedback**
### **Review Questions for Expert:**
- "Do these three modules create a coherent optimization toolkit?"
- "Are the connections between modules clear and natural?"
- "Do the optimization techniques reflect industry best practices?"
- "How well does this prepare students for production ML work?"
## Implementation Priorities
### **Immediate Actions (This Week):**
1. **Rename modules** for thematic flow (16→memory, 17→quantization, 19→profiling)
2. **Restructure Module 15** to show OptimizedBackend upfront
3. **Update Module Developer instructions** (COMPLETED ✅)
4. **Assign agents to modules 16-18** for parallel development
### **Next Week:**
1. **Initial module drafts** from all three agents
2. **Module 15 restructuring** implementation
3. **Profiling module design** finalization
### **Following Week:**
1. **PyTorch expert parallel review** of all drafts
2. **Capstone module planning** based on preferred approach
3. **Integration testing** preparation
This plan ensures systematic development of the complete optimization toolkit while maintaining the beautiful progression we designed!