MASSIVE DOCUMENTATION CLEANUP: - Reduced from 46 docs to 12 essential files - Archived 34 outdated planning and analysis documents ✅ KEPT (Essential for current operations): - STUDENT_QUICKSTART.md - Student onboarding - INSTRUCTOR_GUIDE.md - Instructor setup - cifar10-training-guide.md - North star achievement - tinytorch-assumptions.md - Complexity framework (NEW) - tinytorch-textbook-alignment.md - Academic alignment - NBGrader integration docs (3 files) - Development standards (3 files) - docs/README.md - Navigation guide (NEW) 🗑️ ARCHIVED (Completed/outdated planning): - All optimization-modules-* planning docs - All milestone-* system docs - All tutorial-master-plan and analysis docs - Module reordering and structure analysis - Agent setup and workflow case studies RESULT: Clean, focused documentation structure Only active, current docs remain - easy to find what you need!
7.1 KiB
Optimization Modules Development Plan
Comprehensive Coordination for Modules 15-20
Phase 1: Module Naming & Structure Updates
Recommended Naming Changes:
Current → New (Thematic Flow)
15_acceleration → 15_acceleration (KEEP - perfect)
16_caching → 16_memory (Memory Optimization)
17_precision → 17_quantization (Size Optimization)
18_compression → 18_compression (KEEP - perfect)
19_benchmarking → 19_profiling (Performance Analysis)
20_capstone → 20_capstone (KEEP - perfect)
Why This Thematic Flow Works:
- Acceleration: "Make it faster"
- Memory: "Use memory smarter"
- Quantization: "Use fewer bits"
- Compression: "Remove what's unnecessary"
- Profiling: "Measure everything"
- Capstone: "Put it all together"
Module 15 Structure Changes:
Current Problem: OptimizedBackend comes at the end (line 277) Solution: Move to beginning to show students the goal upfront
New Structure:
- Part 1: The Goal - Show OptimizedBackend first
- Part 2: Why We Need Optimization - Educational loops analysis
- Part 3: Building Better - Blocked algorithms
- Part 4: Production Reality - NumPy integration
- Part 5: Transparent Backend - How automatic switching works
Student Experience: "Here's where we're going (OptimizedBackend), now let me show you how we get there step by step."
Phase 2: Parallel Development Coordination
Agent Team Assignment:
Module 16: Memory Optimization
Agent: Module Developer A Focus: KV caching for transformers Key Components:
KVCacheclass for attention state storage- Incremental attention computation
- Memory vs computation tradeoff analysis
- Integration with Module 14 transformers
Connection to Previous: "Transformers recompute attention every token - wasteful!"
Module 17: Quantization
Agent: Module Developer B Focus: INT8 quantization techniques Key Components:
Quantizerclass for FP32→INT8 conversion- Calibration techniques for accuracy retention
- Quantized operations (matmul, conv)
- Model size reduction analysis
Connection to Previous: "Memory optimization helps, but models are still huge!"
Module 18: Compression
Agent: Module Developer C
Focus: Pruning and knowledge distillation
Key Components:
MagnitudePrunerfor weight removalStructuredPrunerfor channel removalKnowledgeDistillationtrainer- Sparsity pattern analysis
Connection to Previous: "Quantization reduced precision, can we remove weights entirely?"
Parallel Development Timeline:
Week 1: All three agents draft initial implementations Week 2: PyTorch expert reviews all three modules in parallel Week 3: Revisions based on expert feedback Week 4: Integration testing and final polish
Phase 3: Module 19 - Profiling (Not Benchmarking)
New Focus: Performance Profiling Tools
Instead of abstract benchmarking, students build practical profiling tools:
What Students Build:
PerformanceProfiler- Time and memory measurementBottleneckAnalyzer- Identify slow operationsOptimizationComparer- Before/after analysisInteractionAnalyzer- How optimizations combine
Student Experience:
# Profile their own models from previous modules
profiler = PerformanceProfiler()
with profiler.profile("my_transformer"):
output = my_transformer(inputs)
# See exactly where time is spent
profiler.report()
# Output:
# - Attention: 45% of time
# - Feed Forward: 30% of time
# - Embedding: 15% of time
# - Other: 10% of time
# Then apply optimizations and re-profile
profiler.compare_optimizations(baseline, quantized, pruned, cached)
Connection to Previous: "We have all these optimization techniques - how do we measure their combined impact scientifically?"
Phase 4: Module 20 - Capstone Ideas
Option A: Interactive Performance Competition Website
Concept: Students submit optimized models to a leaderboard system
Features:
- Upload optimized model implementations
- Automatic performance testing (speed, memory, accuracy)
- Real-time leaderboard with multiple categories
- Model analysis and optimization suggestions
Categories:
- "Fastest CIFAR-10 Trainer" (speed focus)
- "Most Memory Efficient GPT" (memory focus)
- "Best Accuracy/Size Tradeoff" (balance focus)
- "Most Creative Optimization" (innovation focus)
Option B: Complete ML System Deployment Challenge
Concept: Build and deploy complete optimized ML systems
Project Options:
- Edge AI Challenge: Deploy GPT on Raspberry Pi
- Mobile ML Challenge: CIFAR-10 classifier on phone
- Datacenter Challenge: Multi-GPU training optimization
- Custom Challenge: Student-defined optimization problem
Deliverables:
- Working system with all optimizations
- Performance analysis report
- Deployment documentation
- Innovation summary
Option C: "ML Systems Portfolio" Capstone
Concept: Students create professional portfolio showcasing their TinyTorch journey
Portfolio Components:
- Technical Blog Posts - Explain each optimization technique
- Performance Analysis Reports - Before/after comparisons
- Code Showcase - Best implementations with explanations
- Industry Case Studies - How TinyTorch techniques apply to real systems
- Innovation Project - Original optimization idea
Public Showcase: Host student portfolios on tinytorch.ai/students/
Phase 5: Expert Review Protocol
Parallel Review Process:
Once all three modules (16-18) have initial drafts:
-
Submit to PyTorch Expert simultaneously
-
Expert reviews all three for:
- Pedagogical flow and connections
- Technical accuracy and best practices
- Integration with existing modules
- Production relevance
-
Expert provides comparative feedback:
- How modules work together as a system
- Optimization interaction effects
- Real-world applicability
-
Agents revise based on holistic feedback
Review Questions for Expert:
- "Do these three modules create a coherent optimization toolkit?"
- "Are the connections between modules clear and natural?"
- "Do the optimization techniques reflect industry best practices?"
- "How well does this prepare students for production ML work?"
Implementation Priorities
Immediate Actions (This Week):
- Rename modules for thematic flow (16→memory, 17→quantization, 19→profiling)
- Restructure Module 15 to show OptimizedBackend upfront
- Update Module Developer instructions (COMPLETED ✅)
- Assign agents to modules 16-18 for parallel development
Next Week:
- Initial module drafts from all three agents
- Module 15 restructuring implementation
- Profiling module design finalization
Following Week:
- PyTorch expert parallel review of all drafts
- Capstone module planning based on preferred approach
- Integration testing preparation
This plan ensures systematic development of the complete optimization toolkit while maintaining the beautiful progression we designed!