Files
TinyTorch/docs/archive/2024-cleanup/optimization-modules-development-plan.md
Vijay Janapa Reddi d2cfb2d57e docs: Major cleanup - 46 → 12 essential docs
MASSIVE DOCUMENTATION CLEANUP:
- Reduced from 46 docs to 12 essential files
- Archived 34 outdated planning and analysis documents

 KEPT (Essential for current operations):
- STUDENT_QUICKSTART.md - Student onboarding
- INSTRUCTOR_GUIDE.md - Instructor setup
- cifar10-training-guide.md - North star achievement
- tinytorch-assumptions.md - Complexity framework (NEW)
- tinytorch-textbook-alignment.md - Academic alignment

- NBGrader integration docs (3 files)
- Development standards (3 files)
- docs/README.md - Navigation guide (NEW)

🗑️ ARCHIVED (Completed/outdated planning):
- All optimization-modules-* planning docs
- All milestone-* system docs
- All tutorial-master-plan and analysis docs
- Module reordering and structure analysis
- Agent setup and workflow case studies

RESULT: Clean, focused documentation structure
Only active, current docs remain - easy to find what you need!
2025-09-27 17:04:19 -04:00

7.1 KiB

Optimization Modules Development Plan

Comprehensive Coordination for Modules 15-20

Phase 1: Module Naming & Structure Updates

Current → New (Thematic Flow)
15_acceleration → 15_acceleration (KEEP - perfect)
16_caching → 16_memory (Memory Optimization)
17_precision → 17_quantization (Size Optimization)  
18_compression → 18_compression (KEEP - perfect)
19_benchmarking → 19_profiling (Performance Analysis)
20_capstone → 20_capstone (KEEP - perfect)

Why This Thematic Flow Works:

  • Acceleration: "Make it faster"
  • Memory: "Use memory smarter"
  • Quantization: "Use fewer bits"
  • Compression: "Remove what's unnecessary"
  • Profiling: "Measure everything"
  • Capstone: "Put it all together"

Module 15 Structure Changes:

Current Problem: OptimizedBackend comes at the end (line 277) Solution: Move to beginning to show students the goal upfront

New Structure:

  1. Part 1: The Goal - Show OptimizedBackend first
  2. Part 2: Why We Need Optimization - Educational loops analysis
  3. Part 3: Building Better - Blocked algorithms
  4. Part 4: Production Reality - NumPy integration
  5. Part 5: Transparent Backend - How automatic switching works

Student Experience: "Here's where we're going (OptimizedBackend), now let me show you how we get there step by step."

Phase 2: Parallel Development Coordination

Agent Team Assignment:

Module 16: Memory Optimization

Agent: Module Developer A Focus: KV caching for transformers Key Components:

  • KVCache class for attention state storage
  • Incremental attention computation
  • Memory vs computation tradeoff analysis
  • Integration with Module 14 transformers

Connection to Previous: "Transformers recompute attention every token - wasteful!"

Module 17: Quantization

Agent: Module Developer B Focus: INT8 quantization techniques Key Components:

  • Quantizer class for FP32→INT8 conversion
  • Calibration techniques for accuracy retention
  • Quantized operations (matmul, conv)
  • Model size reduction analysis

Connection to Previous: "Memory optimization helps, but models are still huge!"

Module 18: Compression

Agent: Module Developer C
Focus: Pruning and knowledge distillation Key Components:

  • MagnitudePruner for weight removal
  • StructuredPruner for channel removal
  • KnowledgeDistillation trainer
  • Sparsity pattern analysis

Connection to Previous: "Quantization reduced precision, can we remove weights entirely?"

Parallel Development Timeline:

Week 1: All three agents draft initial implementations Week 2: PyTorch expert reviews all three modules in parallel Week 3: Revisions based on expert feedback Week 4: Integration testing and final polish

Phase 3: Module 19 - Profiling (Not Benchmarking)

New Focus: Performance Profiling Tools

Instead of abstract benchmarking, students build practical profiling tools:

What Students Build:

  1. PerformanceProfiler - Time and memory measurement
  2. BottleneckAnalyzer - Identify slow operations
  3. OptimizationComparer - Before/after analysis
  4. InteractionAnalyzer - How optimizations combine

Student Experience:

# Profile their own models from previous modules
profiler = PerformanceProfiler()
with profiler.profile("my_transformer"):
    output = my_transformer(inputs)

# See exactly where time is spent
profiler.report()
# Output:
# - Attention: 45% of time
# - Feed Forward: 30% of time  
# - Embedding: 15% of time
# - Other: 10% of time

# Then apply optimizations and re-profile
profiler.compare_optimizations(baseline, quantized, pruned, cached)

Connection to Previous: "We have all these optimization techniques - how do we measure their combined impact scientifically?"

Phase 4: Module 20 - Capstone Ideas

Option A: Interactive Performance Competition Website

Concept: Students submit optimized models to a leaderboard system

Features:

  • Upload optimized model implementations
  • Automatic performance testing (speed, memory, accuracy)
  • Real-time leaderboard with multiple categories
  • Model analysis and optimization suggestions

Categories:

  • "Fastest CIFAR-10 Trainer" (speed focus)
  • "Most Memory Efficient GPT" (memory focus)
  • "Best Accuracy/Size Tradeoff" (balance focus)
  • "Most Creative Optimization" (innovation focus)

Option B: Complete ML System Deployment Challenge

Concept: Build and deploy complete optimized ML systems

Project Options:

  1. Edge AI Challenge: Deploy GPT on Raspberry Pi
  2. Mobile ML Challenge: CIFAR-10 classifier on phone
  3. Datacenter Challenge: Multi-GPU training optimization
  4. Custom Challenge: Student-defined optimization problem

Deliverables:

  • Working system with all optimizations
  • Performance analysis report
  • Deployment documentation
  • Innovation summary

Option C: "ML Systems Portfolio" Capstone

Concept: Students create professional portfolio showcasing their TinyTorch journey

Portfolio Components:

  1. Technical Blog Posts - Explain each optimization technique
  2. Performance Analysis Reports - Before/after comparisons
  3. Code Showcase - Best implementations with explanations
  4. Industry Case Studies - How TinyTorch techniques apply to real systems
  5. Innovation Project - Original optimization idea

Public Showcase: Host student portfolios on tinytorch.ai/students/

Phase 5: Expert Review Protocol

Parallel Review Process:

Once all three modules (16-18) have initial drafts:

  1. Submit to PyTorch Expert simultaneously

  2. Expert reviews all three for:

    • Pedagogical flow and connections
    • Technical accuracy and best practices
    • Integration with existing modules
    • Production relevance
  3. Expert provides comparative feedback:

    • How modules work together as a system
    • Optimization interaction effects
    • Real-world applicability
  4. Agents revise based on holistic feedback

Review Questions for Expert:

  • "Do these three modules create a coherent optimization toolkit?"
  • "Are the connections between modules clear and natural?"
  • "Do the optimization techniques reflect industry best practices?"
  • "How well does this prepare students for production ML work?"

Implementation Priorities

Immediate Actions (This Week):

  1. Rename modules for thematic flow (16→memory, 17→quantization, 19→profiling)
  2. Restructure Module 15 to show OptimizedBackend upfront
  3. Update Module Developer instructions (COMPLETED )
  4. Assign agents to modules 16-18 for parallel development

Next Week:

  1. Initial module drafts from all three agents
  2. Module 15 restructuring implementation
  3. Profiling module design finalization

Following Week:

  1. PyTorch expert parallel review of all drafts
  2. Capstone module planning based on preferred approach
  3. Integration testing preparation

This plan ensures systematic development of the complete optimization toolkit while maintaining the beautiful progression we designed!