mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-12 01:45:57 -05:00

Files

Vijay Janapa Reddi d2cfb2d57e docs: Major cleanup - 46 → 12 essential docs

MASSIVE DOCUMENTATION CLEANUP:
- Reduced from 46 docs to 12 essential files
- Archived 34 outdated planning and analysis documents

✅ KEPT (Essential for current operations):
- STUDENT_QUICKSTART.md - Student onboarding
- INSTRUCTOR_GUIDE.md - Instructor setup
- cifar10-training-guide.md - North star achievement
- tinytorch-assumptions.md - Complexity framework (NEW)
- tinytorch-textbook-alignment.md - Academic alignment

- NBGrader integration docs (3 files)
- Development standards (3 files)
- docs/README.md - Navigation guide (NEW)

🗑️ ARCHIVED (Completed/outdated planning):
- All optimization-modules-* planning docs
- All milestone-* system docs
- All tutorial-master-plan and analysis docs
- Module reordering and structure analysis
- Agent setup and workflow case studies

RESULT: Clean, focused documentation structure
Only active, current docs remain - easy to find what you need!

2025-09-27 17:04:19 -04:00

8.4 KiB

Raw Blame History

TinyTorch Optimization Modules Tutorial Plan

Modules 15-20: From Manual Optimization to Automatic Systems

Overview: The Complete Optimization Journey

Students progress from manual optimization techniques to building intelligent systems that optimize automatically, culminating in a competition where their AutoML systems compete.

Manual Optimization (15-18) → Automatic Optimization (19) → Competition (20)

Module 15: Acceleration - Speed Optimization

Connection from Module 14

"Your transformer works but generates text slowly. Let's make it 10-100x faster!"

What Students Build

Transform educational loops into optimized operations
Cache-friendly blocked algorithms
NumPy vectorization integration
Transparent backend dispatch system

Key Learning Outcomes

Understand why educational loops are slow (cache misses, no vectorization)
Build blocked matrix multiplication for cache efficiency
Learn when to use optimized libraries vs custom code
Create backend systems for transparent optimization

Module Structure Change

NEW: Show OptimizedBackend class upfront as the goal
Students see where they're heading before learning the steps
"Here's the elegant solution, now let's understand how to build it"

Performance Impact: 10-100x speedup on matrix operations

Module 16: Memory - Memory Optimization

Connection from Module 15

"Operations are faster, but transformers still recompute everything. Let's be smarter with memory!"

What Students Build

KVCache class for transformer attention states
Incremental attention computation (process only new tokens)
Memory profiling and analysis tools
Cache management strategies

Key Learning Outcomes

Memory vs computation tradeoffs
Understanding O(N²) → O(N) optimization for sequences
Production caching patterns (GPT, LLaMA)
When caching helps vs hurts performance

Performance Impact: 50x speedup in autoregressive generation

Module 17: Quantization - Precision Optimization

Connection from Module 16

"Memory usage is optimized, but models are still huge. Let's use fewer bits!"

What Students Build

Quantizer class for FP32→INT8 conversion
Calibration techniques for maintaining accuracy
Quantized operations (matmul, conv2d)
Model size analysis tools

Key Learning Outcomes

Numerical precision vs accuracy tradeoffs
Post-training quantization techniques
Hardware acceleration through reduced precision
When to use INT8 vs FP16 vs FP32

Performance Impact: 4x model size reduction, 2-4x inference speedup

Module 18: Compression - Structural Optimization

Connection from Module 17

"We're using fewer bits, but can we remove weights entirely?"

What Students Build

MagnitudePruner for weight removal
StructuredPruner for channel/filter removal
Basic knowledge distillation
Sparsity visualization tools

Key Learning Outcomes

Structured vs unstructured pruning
Magnitude-based pruning strategies
Knowledge distillation basics
Sparsity patterns and hardware efficiency

Performance Impact: 90% sparsity with <5% accuracy loss

Module 19: AutoTuning - Automatic Optimization

Connection from Module 18

"We have all these optimization techniques. Let's build systems that apply them automatically!"

What Students Build

class AutoTuner:
    def auto_optimize(self, model, constraints):
        """
        Automatically decide:
        - Which optimizations to apply
        - In what order
        - With what parameters
        - For what deployment target
        """
        pass
    
    def hyperparameter_search(self, model, data, budget):
        """Smart hyperparameter tuning (not random)"""
        pass
    
    def optimization_pipeline(self, model, target_hardware):
        """Build optimal pipeline for specific hardware"""
        pass
    
    def adaptive_training(self, model, data):
        """Training that adapts based on progress"""
        pass

Key Learning Outcomes

Automated optimization strategy selection
Constraint-based optimization (memory, latency, accuracy)
Hardware-aware optimization pipelines
Smart search strategies (Bayesian optimization basics)
Data-efficient training (curriculum learning, active learning)

Student Experience

"I built a system that takes any model and automatically optimizes it for any deployment target!"

Scope Balance (Not Too Complex)

Focus on rule-based automation (if mobile → aggressive quantization)
Simple grid search with smart pruning (not full Bayesian optimization)
Basic hardware detection (CPU vs GPU vs Mobile)
Pre-built optimization recipes that students can combine

Module 20: Competition - AutoML Olympics

Connection from Module 19

"You've built AutoTuning systems. Time to compete!"

What Students Build

Complete end-to-end optimized ML systems
Submission package for competition platform
Performance analysis reports
Innovation documentation

Competition Categories

Speed Challenge: Fastest to reach target accuracy
Size Challenge: Best accuracy under size constraints
Efficiency Challenge: Best accuracy/resource tradeoff
Innovation Challenge: Most creative optimization approach

Platform Concept

class CompetitionSubmission:
    def __init__(self, team_name):
        self.model = self.build_model()
        self.auto_tuner = self.build_autotuner()
        self.optimized = self.auto_tuner.optimize(self.model)
    
    def evaluate(self, test_data):
        """Automated evaluation on hidden test set"""
        return {
            'accuracy': self.measure_accuracy(test_data),
            'latency': self.measure_latency(),
            'memory': self.measure_memory(),
            'model_size': self.measure_size()
        }

Leaderboard System

Real-time rankings across multiple metrics
Automated testing on standardized hardware
Public showcase of techniques used
Innovation bonus for novel approaches

Implementation Timeline

Week 1: Foundation

Create placeholder directories for modules 16-20
Restructure Module 15 with OptimizedBackend upfront
Begin drafting Module 16 (Memory)

Week 2: Parallel Development

Modules 16-18 developed in parallel by different agents
PyTorch expert reviews all three simultaneously
Integration testing between modules

Week 3: AutoTuning Development

Module 19 development with appropriate scope
Integration with all previous optimization modules
Testing of automatic optimization pipelines

Week 4: Competition Platform

Module 20 competition framework
Leaderboard system design
Submission and evaluation pipeline

Directory Structure

modules/
├── 15_acceleration/     [EXISTS - needs restructuring]
├── 16_memory/           [TO CREATE]
│   ├── memory_dev.py
│   ├── module.yaml
│   └── README.md
├── 17_quantization/     [TO CREATE] 
│   ├── quantization_dev.py
│   ├── module.yaml
│   └── README.md
├── 18_compression/      [EXISTS - needs development]
│   ├── compression_dev.py
│   ├── module.yaml
│   └── README.md
├── 19_autotuning/       [TO CREATE]
│   ├── autotuning_dev.py
│   ├── module.yaml
│   └── README.md
└── 20_competition/      [TO CREATE]
    ├── competition_dev.py
    ├── module.yaml
    └── README.md

Success Metrics

Educational Success

Students understand when/why to apply each optimization
Can build automated optimization systems
Understand tradeoffs and constraints
Ready for production ML engineering roles

Technical Success

All optimizations integrate seamlessly
AutoTuner successfully combines techniques
Competition platform handles submissions
Measurable performance improvements achieved

Engagement Success

Students excited about optimization
Active competition participation
Innovative approaches developed
Community sharing of techniques

Next Steps

Get PyTorch expert validation on AutoTuning scope
Create placeholder directories for new modules
Begin parallel development of modules 16-18
Design competition platform architecture
Update master roadmap with final structure

8.4 KiB Raw Blame History

TinyTorch Optimization Modules Tutorial Plan

Modules 15-20: From Manual Optimization to Automatic Systems

Overview: The Complete Optimization Journey

Module 15: Acceleration - Speed Optimization

Connection from Module 14

What Students Build

Key Learning Outcomes

Module Structure Change

Performance Impact: 10-100x speedup on matrix operations

Module 16: Memory - Memory Optimization

Connection from Module 15

What Students Build

Key Learning Outcomes

Performance Impact: 50x speedup in autoregressive generation

Module 17: Quantization - Precision Optimization

Connection from Module 16

What Students Build

Key Learning Outcomes

Performance Impact: 4x model size reduction, 2-4x inference speedup

Module 18: Compression - Structural Optimization

Connection from Module 17

What Students Build

Key Learning Outcomes

Performance Impact: 90% sparsity with <5% accuracy loss

Module 19: AutoTuning - Automatic Optimization

Connection from Module 18

What Students Build

Key Learning Outcomes

Student Experience

Scope Balance (Not Too Complex)

Module 20: Competition - AutoML Olympics

Connection from Module 19

What Students Build

Competition Categories

Platform Concept

Leaderboard System

Implementation Timeline

Week 1: Foundation

Week 2: Parallel Development

Week 3: AutoTuning Development

Week 4: Competition Platform

Directory Structure

Success Metrics

Educational Success

Technical Success

Engagement Success

Next Steps

8.4 KiB

Raw Blame History