mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-12 02:43:35 -05:00

Files

Vijay Janapa Reddi 753ae52ae0 MAJOR: Implement beautiful module progression through strategic reordering

This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

- ✅ All CLI commands still function
- ✅ Checkpoint system mappings updated
- ✅ Documentation consistency maintained
- ✅ Test directory structure aligned
- ✅ Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.

2025-09-24 15:56:47 -04:00

module.yaml

MAJOR: Implement beautiful module progression through strategic reordering

2025-09-24 15:56:47 -04:00

README.md

MAJOR: Implement beautiful module progression through strategic reordering

2025-09-24 15:56:47 -04:00

README.md

Module 17: Precision - Numerical Optimization through Quantization

Overview

Reduce model size by 75% and accelerate inference by 2-4x through INT8 quantization. Learn how production systems deploy billion-parameter models on edge devices.

What You'll Build

INT8 Quantizer: Convert FP32 models to INT8
Calibration System: Find optimal scaling factors
Quantized Operations: Fast integer arithmetic
Accuracy Validator: Measure precision/performance tradeoffs

Learning Objectives

Numerical Representation: FP32 vs FP16 vs INT8 tradeoffs
Post-Training Quantization: Convert trained models efficiently
Calibration Techniques: Minimize accuracy loss
Hardware Acceleration: Why INT8 is 4x faster on modern hardware

Prerequisites

Module 15: Acceleration (backend dispatch)
Module 10: Training (trained models to quantize)

Key Concepts

The Problem: Model Size and Speed

# FP32 Model - High precision, slow, large
model = TinyGPT()  # 400MB, 100ms/token

# After quantization - Lower precision, fast, small  
quantized = quantize_int8(model)  # 100MB, 25ms/token

Quantization Process

# 1. Calibration - Find scale factors
scales = calibrate(model, calibration_data)

# 2. Quantization - Convert weights
quantized_weights = (weights / scales).round().clip(-128, 127)

# 3. Inference - Use integer ops
output = quantized_forward(input, quantized_weights, scales)

Performance Impact

Model Size: 4x reduction (FP32 → INT8)
Inference Speed: 2-4x faster on CPU/GPU
Accuracy: Typically <1% loss with good calibration
Memory Bandwidth: 4x reduction

Real-World Applications

Mobile Deployment: Run LLMs on phones
Edge AI: Raspberry Pi inference
Datacenter Efficiency: 4x more models per GPU
TensorFlow Lite: Production quantization

Module Structure

Numerical Basics: Understanding precision and range
Quantization Math: Scale factors and rounding
Calibration: Finding optimal quantization parameters
Implementation: Building quantized operations
Evaluation: Accuracy vs performance analysis

Hands-On Examples

# Quantize your trained CNN
cnn = load_trained_model("cifar10_cnn.pt")
quantized = quantize_model(cnn, calibration_loader)

# Compare accuracy
original_acc = evaluate(cnn, test_loader)      # 75.2%
quantized_acc = evaluate(quantized, test_loader)  # 74.8%

# Measure speedup
original_time = benchmark(cnn)      # 45ms/batch
quantized_time = benchmark(quantized)  # 12ms/batch (3.75x faster!)

Success Criteria

✅ Quantize models to INT8 with <1% accuracy loss
✅ Achieve 2-4x inference speedup
✅ Reduce model size by 75%
✅ Understand hardware acceleration principles