mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-07 09:10:21 -05:00

Files

Vijay Janapa Reddi 59d58718f9 refactor: Implement learner-focused module progression with better naming

✅ Renamed modules for clearer pedagogical flow:
- 05_networks → 05_dense (multi-layer dense/fully connected networks)
- 06_cnn → 06_spatial (convolutional networks for spatial patterns)
- 06_attention → 07_attention (attention mechanisms for sequences)

✅ Shifted remaining modules down by 1:
- 07_dataloader → 08_dataloader
- 08_autograd → 09_autograd
- 09_optimizers → 10_optimizers
- 10_training → 11_training
- 11_compression → 12_compression
- 12_kernels → 13_kernels
- 13_benchmarking → 14_benchmarking
- 14_mlops → 15_mlops
- 15_capstone → 16_capstone

✅ Updated module metadata (module.yaml files):
- Updated names, descriptions, dependencies
- Fixed prerequisite chains and enables relationships
- Updated export paths to match new names

New learner progression:
Foundation → Individual Layers → Dense Networks → Spatial Networks → Attention Networks → Training Pipeline

Perfect pedagogical flow: Build one layer → Stack dense layers → Add spatial patterns → Add attention mechanisms → Learn to train them all.

2025-07-18 00:12:50 -04:00

capstone_guide.md

refactor: Implement learner-focused module progression with better naming

2025-07-18 00:12:50 -04:00

module.yaml

refactor: Implement learner-focused module progression with better naming

2025-07-18 00:12:50 -04:00

README.md

refactor: Implement learner-focused module progression with better naming

2025-07-18 00:12:50 -04:00

README.md

🎓 Capstone Project

📊 Module Info

Difficulty: ⭐⭐⭐⭐⭐ Expert Systems Engineering 🥷
Time Estimate: Capstone Project (flexible scope and pacing)
Prerequisites: All 14 TinyTorch modules - Your complete ML framework
Outcome: Advanced framework engineering skills - Prove deep systems mastery

Welcome to your TinyTorch capstone! You've built a complete ML framework from scratch. Now make it faster, better, and more professional through systematic optimization. This isn't about building apps—it's about becoming the engineer others ask: "How do I make this framework better?"

🎯 Learning Objectives

By the end of this capstone, you will be able to:

Profile and optimize ML frameworks: Use systematic analysis to identify and eliminate performance bottlenecks
Extend framework capabilities: Add new algorithms, layers, and optimizers using consistent architectural patterns
Engineer production-ready systems: Implement memory optimization, parallel computing, and developer tools for real-world use
Make informed trade-offs: Understand engineering decisions around memory vs speed, accuracy vs efficiency, and simplicity vs performance
Demonstrate framework mastery: Prove deep understanding through architectural improvements that showcase true systems expertise

<EFBFBD><EFBFBD> Build → Optimize → Reflect

This capstone follows TinyTorch's Build → Optimize → Reflect framework:

Build: You already built a complete ML framework (Modules 1-14)
Optimize: Systematically improve your framework through performance engineering and capability extensions
Master: Prove deep understanding by making architectural improvements that demonstrate true framework mastery

🚀 The Capstone Challenge

After completing the 14 core modules, you have a complete ML framework. Now optimize it, extend it, and make it faster through systems engineering:

⚡ Track 1: Performance Engineering

Goal: Make your TinyTorch framework faster and more memory-efficient

Example Project: GPU-Accelerated Matrix Operations

# Current: CPU-only operations
def matmul_naive(A, B):
    return np.dot(A, B)  # Single-threaded, slow

# Your optimization: GPU kernels + vectorization
def matmul_optimized(A, B):
    # YOUR implementation using:
    # - NumPy vectorization
    # - Memory layout optimization  
    # - Cache-efficient algorithms
    # - Parallel computation
    pass

Concrete Tasks:

Profile your current tensor operations and identify bottlenecks
Implement vectorized operations that are 5-10x faster
Optimize memory usage in training loops (reduce by 30%+)
Add parallel processing for batch operations
Benchmark against PyTorch and analyze performance gaps

🧠 Track 2: Algorithm Extensions

Goal: Add modern ML algorithms to your framework

Example Project: Transformer Attention Block

# Current: Basic layers (Dense, Conv2D)
from tinytorch.core.layers import Dense

# Your extension: Modern attention mechanisms
class MultiHeadAttention:
    def __init__(self, d_model, num_heads):
        # YOUR implementation using only TinyTorch components
        self.query = Dense(d_model, d_model)
        self.key = Dense(d_model, d_model)  
        self.value = Dense(d_model, d_model)
        # ... attention math using your autograd
    
    def forward(self, x):
        # YOUR attention implementation
        pass

Concrete Tasks:

Implement BatchNormalization using your tensor and autograd systems
Build Transformer attention blocks with your Dense layers
Add advanced optimizers (AdamW, RMSprop) using your autograd
Create Dropout and regularization techniques
Extend your CNN module with modern architectures

🔧 Track 3: Systems Optimization

Goal: Make your framework production-ready and scalable

Example Project: Memory-Efficient Training Pipeline

# Current: Basic training loop
def train_epoch(model, dataloader, optimizer):
    for batch in dataloader:
        loss = model(batch)
        loss.backward()
        optimizer.step()

# Your optimization: Production training system
class OptimizedTrainer:
    def __init__(self, model, config):
        # YOUR implementation with:
        # - Memory profiling and optimization
        # - Gradient accumulation
        # - Mixed precision training
        # - Checkpointing and resuming
        pass

Concrete Tasks:

Implement gradient accumulation for large batch training
Add memory profiling and leak detection
Create model checkpointing and resuming systems
Build distributed training across multiple processes
Optimize data loading pipelines for better GPU utilization

📊 Track 4: Framework Analysis

Goal: Build comprehensive benchmarking and comparison tools

Example Project: TinyTorch vs PyTorch Benchmark Suite

# Your benchmarking framework
class FrameworkComparison:
    def __init__(self):
        # Compare TinyTorch vs PyTorch on:
        # - Training speed and memory usage
        # - Accuracy on standard datasets
        # - Code complexity and maintainability
        pass
    
    def benchmark_operation(self, op_name, input_shapes):
        # Run identical operations in both frameworks
        tinytorch_time = self.benchmark_tinytorch(op_name, input_shapes)
        pytorch_time = self.benchmark_pytorch(op_name, input_shapes)
        return self.analyze_performance_gap(tinytorch_time, pytorch_time)

Concrete Tasks:

Create automated benchmarks comparing TinyTorch to PyTorch
Analyze where your framework is slower and why
Build performance regression testing
Profile memory usage patterns and identify optimization opportunities
Create detailed performance reports with recommendations

🛠️ Track 5: Developer Experience

Goal: Make your framework easier to debug, understand, and extend

Example Project: TinyTorch Debugging and Visualization Suite

# Your developer tools
class TinyTorchDebugger:
    def __init__(self, model):
        # YOUR implementation providing:
        # - Gradient flow visualization
        # - Layer activation inspection
        # - Training dynamics plotting
        # - Error diagnosis and suggestions
        pass
    
    def visualize_gradients(self):
        # Show gradient magnitudes across layers
        pass
    
    def diagnose_training_issues(self):
        # Detect vanishing/exploding gradients, learning rate problems
        pass

Concrete Tasks:

Build gradient visualization tools for debugging
Create layer activation inspection utilities
Implement training dynamics plotting and analysis
Add better error messages with suggestions for fixes
Build automated testing tools for new components

📋 Project Structure and Timeline

Phase 1: Analysis & Planning

Profile your current framework: Use Python's cProfile and memory_profiler to identify bottlenecks
Define success metrics: What does "better" mean for your chosen track?
Set specific goals: "Reduce training time by 30%" or "Add BatchNorm with full autograd support"
Plan implementation: Break your project into 3-4 concrete milestones

Phase 2: Core Implementation

Build incrementally: Start with the simplest version that works
Test constantly: Use your existing TinyTorch models to verify improvements
Benchmark early: Measure performance at each step
Document decisions: Keep notes on trade-offs and engineering choices

Phase 3: Integration & Optimization

Integrate with existing systems: Ensure your improvements work with all TinyTorch modules
Optimize performance: Polish and fine-tune your implementation
Create comprehensive tests: Verify your additions don't break existing functionality
Write documentation: Explain your improvements and how others can use them

Phase 4: Evaluation & Presentation

Benchmark final results: Compare before/after performance
Analyze trade-offs: What did you sacrifice? What did you gain?
Create demonstration: Show your improvements working on real examples
Write project report: Document your engineering journey and lessons learned

🏗️ Getting Started: Example Walkthrough

Let's walk through starting a Performance Engineering project:

Step 1: Profile Your Current Framework

import cProfile
import pstats
from memory_profiler import profile

# Profile your training loop
profiler = cProfile.Profile()
profiler.enable()

# Run your CIFAR-10 training from Module 10
model = create_mlp([3072, 128, 64, 10])
train_model(model, cifar10_data, epochs=1)

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 slowest functions

Step 2: Identify Bottlenecks

Common findings:
- 60% of time in tensor operations (matmul, convolution)
- 25% of time in data loading and preprocessing  
- 10% of time in gradient computation
- 5% of time in optimizer updates

Step 3: Choose Your Target

Focus on the biggest bottleneck. If it's tensor operations, implement:

# Before: Naive implementation
def matmul_naive(A, B):
    # Your current implementation from Module 1
    pass

# After: Optimized implementation  
def matmul_vectorized(A, B):
    # Use advanced NumPy, better algorithms
    # Target: 5-10x speedup
    pass

Step 4: Implement and Test

# Benchmark your improvement
import time

A = np.random.randn(1000, 1000)
B = np.random.randn(1000, 1000)

# Test current implementation
start = time.time()
result1 = matmul_naive(A, B)
naive_time = time.time() - start

# Test optimized implementation
start = time.time()
result2 = matmul_vectorized(A, B)
optimized_time = time.time() - start

speedup = naive_time / optimized_time
print(f"Speedup: {speedup:.2f}x")
assert np.allclose(result1, result2)  # Verify correctness

🎯 Success Criteria

Your capstone is successful when you can demonstrate:

Technical Mastery

Measurable improvement: 20%+ performance gain, new functionality, or better developer experience
Systems thinking: Your solution integrates cleanly with existing TinyTorch components
Engineering trade-offs: You understand and can explain what you optimized and what you sacrificed

Framework Understanding

No external dependencies: Your improvements use only TinyTorch components you built
Architectural consistency: Your additions follow TinyTorch patterns and design principles
Comprehensive testing: Your improvements don't break existing functionality

Professional Development

Project documentation: Clear explanation of problem, solution, and results
Performance analysis: Before/after benchmarks with engineering insights
Future roadmap: Identification of next optimization opportunities

🏆 Deliverables

Submit your capstone as a complete project including:

📊 Project Report (capstone_report.md)
- Problem analysis and motivation
- Technical approach and implementation details
- Performance results and benchmarks
- Engineering trade-offs and lessons learned
💻 Implementation Code (src/ directory)
- Your optimized/extended TinyTorch components
- Comprehensive tests demonstrating functionality
- Integration examples showing your improvements in action
📈 Benchmark Results (benchmarks/ directory)
- Before/after performance comparisons
- Memory usage analysis
- Comparison to PyTorch (where relevant)
🎥 Demonstration (demo.py)
- Working example showing your improvements
- Side-by-side comparison with original TinyTorch
- Real use case demonstrating practical value

💡 Pro Tips for Success

Start Small, Think Big

Begin with the simplest version that works
Measure early and often to guide optimization
Don't try to optimize everything—focus on the biggest impact

Use Your Existing Framework

Test improvements using models from previous modules
Verify compatibility with CIFAR-10 training from Module 10
Use your benchmarking tools from Module 13

Document Engineering Decisions

Keep notes on why you chose specific approaches
Record trade-offs between memory, speed, and complexity
Explain how your improvements fit TinyTorch's design philosophy

Think Like a Framework Engineer

How would other developers use your improvements?
What APIs would make sense?
How do your changes affect the learning experience?

🚀 Ready to Optimize Your Framework?

Choose your track, profile your current implementation, and start building. Remember: you're not just optimizing code—you're proving that you understand ML systems engineering at the deepest level.

Your goal: Become the engineer others ask when they need to make their ML framework better.

Start by choosing your track and running the profiling example above. Your TinyTorch framework is waiting to be optimized!

🔥 Let's make TinyTorch even better. Start optimizing.

README.md Unescape Escape

🎓 Capstone Project

📊 Module Info

🎯 Learning Objectives

<EFBFBD><EFBFBD> Build → Optimize → Reflect

🚀 The Capstone Challenge

⚡ Track 1: Performance Engineering

🧠 Track 2: Algorithm Extensions

🔧 Track 3: Systems Optimization

📊 Track 4: Framework Analysis

🛠️ Track 5: Developer Experience

📋 Project Structure and Timeline

Phase 1: Analysis & Planning

Phase 2: Core Implementation

Phase 3: Integration & Optimization

Phase 4: Evaluation & Presentation

🏗️ Getting Started: Example Walkthrough

Step 1: Profile Your Current Framework

Step 2: Identify Bottlenecks

Step 3: Choose Your Target

Step 4: Implement and Test

🎯 Success Criteria

Technical Mastery

Framework Understanding

Professional Development

🏆 Deliverables

💡 Pro Tips for Success

Start Small, Think Big

Use Your Existing Framework

Document Engineering Decisions

Think Like a Framework Engineer

🚀 Ready to Optimize Your Framework?

README.md