Files
TinyTorch/docs/archive/2024-cleanup/optimization-modules-tutorial-plan.md
Vijay Janapa Reddi d2cfb2d57e docs: Major cleanup - 46 → 12 essential docs
MASSIVE DOCUMENTATION CLEANUP:
- Reduced from 46 docs to 12 essential files
- Archived 34 outdated planning and analysis documents

 KEPT (Essential for current operations):
- STUDENT_QUICKSTART.md - Student onboarding
- INSTRUCTOR_GUIDE.md - Instructor setup
- cifar10-training-guide.md - North star achievement
- tinytorch-assumptions.md - Complexity framework (NEW)
- tinytorch-textbook-alignment.md - Academic alignment

- NBGrader integration docs (3 files)
- Development standards (3 files)
- docs/README.md - Navigation guide (NEW)

🗑️ ARCHIVED (Completed/outdated planning):
- All optimization-modules-* planning docs
- All milestone-* system docs
- All tutorial-master-plan and analysis docs
- Module reordering and structure analysis
- Agent setup and workflow case studies

RESULT: Clean, focused documentation structure
Only active, current docs remain - easy to find what you need!
2025-09-27 17:04:19 -04:00

8.4 KiB

TinyTorch Optimization Modules Tutorial Plan

Modules 15-20: From Manual Optimization to Automatic Systems

Overview: The Complete Optimization Journey

Students progress from manual optimization techniques to building intelligent systems that optimize automatically, culminating in a competition where their AutoML systems compete.

Manual Optimization (15-18) → Automatic Optimization (19) → Competition (20)

Module 15: Acceleration - Speed Optimization

Connection from Module 14

"Your transformer works but generates text slowly. Let's make it 10-100x faster!"

What Students Build

  • Transform educational loops into optimized operations
  • Cache-friendly blocked algorithms
  • NumPy vectorization integration
  • Transparent backend dispatch system

Key Learning Outcomes

  • Understand why educational loops are slow (cache misses, no vectorization)
  • Build blocked matrix multiplication for cache efficiency
  • Learn when to use optimized libraries vs custom code
  • Create backend systems for transparent optimization

Module Structure Change

  • NEW: Show OptimizedBackend class upfront as the goal
  • Students see where they're heading before learning the steps
  • "Here's the elegant solution, now let's understand how to build it"

Performance Impact: 10-100x speedup on matrix operations


Module 16: Memory - Memory Optimization

Connection from Module 15

"Operations are faster, but transformers still recompute everything. Let's be smarter with memory!"

What Students Build

  • KVCache class for transformer attention states
  • Incremental attention computation (process only new tokens)
  • Memory profiling and analysis tools
  • Cache management strategies

Key Learning Outcomes

  • Memory vs computation tradeoffs
  • Understanding O(N²) → O(N) optimization for sequences
  • Production caching patterns (GPT, LLaMA)
  • When caching helps vs hurts performance

Performance Impact: 50x speedup in autoregressive generation


Module 17: Quantization - Precision Optimization

Connection from Module 16

"Memory usage is optimized, but models are still huge. Let's use fewer bits!"

What Students Build

  • Quantizer class for FP32→INT8 conversion
  • Calibration techniques for maintaining accuracy
  • Quantized operations (matmul, conv2d)
  • Model size analysis tools

Key Learning Outcomes

  • Numerical precision vs accuracy tradeoffs
  • Post-training quantization techniques
  • Hardware acceleration through reduced precision
  • When to use INT8 vs FP16 vs FP32

Performance Impact: 4x model size reduction, 2-4x inference speedup


Module 18: Compression - Structural Optimization

Connection from Module 17

"We're using fewer bits, but can we remove weights entirely?"

What Students Build

  • MagnitudePruner for weight removal
  • StructuredPruner for channel/filter removal
  • Basic knowledge distillation
  • Sparsity visualization tools

Key Learning Outcomes

  • Structured vs unstructured pruning
  • Magnitude-based pruning strategies
  • Knowledge distillation basics
  • Sparsity patterns and hardware efficiency

Performance Impact: 90% sparsity with <5% accuracy loss


Module 19: AutoTuning - Automatic Optimization

Connection from Module 18

"We have all these optimization techniques. Let's build systems that apply them automatically!"

What Students Build

class AutoTuner:
    def auto_optimize(self, model, constraints):
        """
        Automatically decide:
        - Which optimizations to apply
        - In what order
        - With what parameters
        - For what deployment target
        """
        pass
    
    def hyperparameter_search(self, model, data, budget):
        """Smart hyperparameter tuning (not random)"""
        pass
    
    def optimization_pipeline(self, model, target_hardware):
        """Build optimal pipeline for specific hardware"""
        pass
    
    def adaptive_training(self, model, data):
        """Training that adapts based on progress"""
        pass

Key Learning Outcomes

  • Automated optimization strategy selection
  • Constraint-based optimization (memory, latency, accuracy)
  • Hardware-aware optimization pipelines
  • Smart search strategies (Bayesian optimization basics)
  • Data-efficient training (curriculum learning, active learning)

Student Experience

"I built a system that takes any model and automatically optimizes it for any deployment target!"

Scope Balance (Not Too Complex)

  • Focus on rule-based automation (if mobile → aggressive quantization)
  • Simple grid search with smart pruning (not full Bayesian optimization)
  • Basic hardware detection (CPU vs GPU vs Mobile)
  • Pre-built optimization recipes that students can combine

Module 20: Competition - AutoML Olympics

Connection from Module 19

"You've built AutoTuning systems. Time to compete!"

What Students Build

  • Complete end-to-end optimized ML systems
  • Submission package for competition platform
  • Performance analysis reports
  • Innovation documentation

Competition Categories

  1. Speed Challenge: Fastest to reach target accuracy
  2. Size Challenge: Best accuracy under size constraints
  3. Efficiency Challenge: Best accuracy/resource tradeoff
  4. Innovation Challenge: Most creative optimization approach

Platform Concept

class CompetitionSubmission:
    def __init__(self, team_name):
        self.model = self.build_model()
        self.auto_tuner = self.build_autotuner()
        self.optimized = self.auto_tuner.optimize(self.model)
    
    def evaluate(self, test_data):
        """Automated evaluation on hidden test set"""
        return {
            'accuracy': self.measure_accuracy(test_data),
            'latency': self.measure_latency(),
            'memory': self.measure_memory(),
            'model_size': self.measure_size()
        }

Leaderboard System

  • Real-time rankings across multiple metrics
  • Automated testing on standardized hardware
  • Public showcase of techniques used
  • Innovation bonus for novel approaches

Implementation Timeline

Week 1: Foundation

  • Create placeholder directories for modules 16-20
  • Restructure Module 15 with OptimizedBackend upfront
  • Begin drafting Module 16 (Memory)

Week 2: Parallel Development

  • Modules 16-18 developed in parallel by different agents
  • PyTorch expert reviews all three simultaneously
  • Integration testing between modules

Week 3: AutoTuning Development

  • Module 19 development with appropriate scope
  • Integration with all previous optimization modules
  • Testing of automatic optimization pipelines

Week 4: Competition Platform

  • Module 20 competition framework
  • Leaderboard system design
  • Submission and evaluation pipeline

Directory Structure

modules/
├── 15_acceleration/     [EXISTS - needs restructuring]
├── 16_memory/           [TO CREATE]
│   ├── memory_dev.py
│   ├── module.yaml
│   └── README.md
├── 17_quantization/     [TO CREATE] 
│   ├── quantization_dev.py
│   ├── module.yaml
│   └── README.md
├── 18_compression/      [EXISTS - needs development]
│   ├── compression_dev.py
│   ├── module.yaml
│   └── README.md
├── 19_autotuning/       [TO CREATE]
│   ├── autotuning_dev.py
│   ├── module.yaml
│   └── README.md
└── 20_competition/      [TO CREATE]
    ├── competition_dev.py
    ├── module.yaml
    └── README.md

Success Metrics

Educational Success

  • Students understand when/why to apply each optimization
  • Can build automated optimization systems
  • Understand tradeoffs and constraints
  • Ready for production ML engineering roles

Technical Success

  • All optimizations integrate seamlessly
  • AutoTuner successfully combines techniques
  • Competition platform handles submissions
  • Measurable performance improvements achieved

Engagement Success

  • Students excited about optimization
  • Active competition participation
  • Innovative approaches developed
  • Community sharing of techniques

Next Steps

  1. Get PyTorch expert validation on AutoTuning scope
  2. Create placeholder directories for new modules
  3. Begin parallel development of modules 16-18
  4. Design competition platform architecture
  5. Update master roadmap with final structure