MAJOR: Implement beautiful module progression through strategic reordering

This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

-  All CLI commands still function
-  Checkpoint system mappings updated
-  Documentation consistency maintained
-  Test directory structure aligned
-  Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
This commit is contained in:
Vijay Janapa Reddi
2025-09-24 15:56:47 -04:00
parent 0d87b6603f
commit 2f23f757e7
68 changed files with 5875 additions and 2399 deletions

View File

@@ -218,7 +218,11 @@ By implementing matrix multiplication, you'll understand:
#| export
def matmul(a: Tensor, b: Tensor) -> Tensor:
"""
Matrix multiplication for tensors.
Matrix multiplication for tensors using explicit loops.
This implementation uses triple-nested loops for educational understanding
of the fundamental operations. Module 15 will show the optimization progression
from loops → blocking → vectorized operations.
Args:
a: Left tensor (shape: ..., m, k)
@@ -227,18 +231,24 @@ def matmul(a: Tensor, b: Tensor) -> Tensor:
Returns:
Result tensor (shape: ..., m, n)
TODO: Implement matrix multiplication using numpy's @ operator.
TODO: Implement matrix multiplication using explicit loops.
STEP-BY-STEP IMPLEMENTATION:
1. Extract numpy arrays from both tensors using .data
2. Perform matrix multiplication: result_data = a_data @ b_data
3. Wrap result in a new Tensor and return
2. Check tensor shapes for compatibility
3. Use triple-nested loops to show every operation
4. Wrap result in a new Tensor and return
LEARNING CONNECTIONS:
- This is the core operation in Dense layers: output = input @ weights
- PyTorch uses optimized BLAS libraries for this operation
- GPU implementations parallelize this across thousands of cores
- Understanding this operation is key to neural network performance
- Shows the fundamental computation before optimization
- Module 15 will demonstrate the progression to high-performance implementations
- Understanding loops helps appreciate vectorization and GPU parallelization
EDUCATIONAL APPROACH:
- Intentionally simple for understanding, not performance
- Makes every multiply-add operation explicit
- Sets up Module 15 to show optimization techniques
EXAMPLE:
```python
@@ -249,20 +259,42 @@ def matmul(a: Tensor, b: Tensor) -> Tensor:
```
IMPLEMENTATION HINTS:
- Use the @ operator for clean matrix multiplication
- Ensure you return a Tensor, not a numpy array
- The operation should work for any compatible matrix shapes
- Use explicit loops to show every operation
- This is educational, not optimized for performance
- Module 15 will show the progression to fast implementations
"""
### BEGIN SOLUTION
# Extract numpy arrays from tensors
a_data = a.data
b_data = b.data
# Perform matrix multiplication
result_data = a_data @ b_data
# Get dimensions and validate compatibility
if len(a_data.shape) != 2 or len(b_data.shape) != 2:
raise ValueError("matmul requires 2D tensors")
m, k = a_data.shape
k2, n = b_data.shape
if k != k2:
raise ValueError(f"Inner dimensions must match: {k} != {k2}")
# Initialize result matrix
result = np.zeros((m, n), dtype=a_data.dtype)
# Triple nested loops - educational, shows every operation
# This is intentionally simple to understand the fundamental computation
# Module 15 will show the optimization journey:
# Step 1 (here): Educational loops - slow but clear
# Step 2: Loop blocking for cache efficiency
# Step 3: Vectorized operations with NumPy
# Step 4: GPU acceleration and BLAS libraries
for i in range(m): # For each row in result
for j in range(n): # For each column in result
for k_idx in range(k): # Dot product: sum over inner dimension
result[i, j] += a_data[i, k_idx] * b_data[k_idx, j]
# Return new Tensor with result
return Tensor(result_data)
return Tensor(result)
### END SOLUTION
# %% [markdown]