Files
Vijay Janapa Reddi ee9355584f Fix all module tests after merge - 20/20 passing
Fixes after merge conflicts:
- Fix tensor reshape error message format
- Fix __init__.py imports (remove BatchNorm2d, fix enable_autograd call)
- Fix attention mask broadcasting for multi-head attention
- Fix memoization module to use matmul instead of @ operator
- Fix capstone module count_parameters and CosineSchedule usage
- Add missing imports to benchmark.py (dataclass, Profiler, platform, os)
- Simplify capstone pipeline test to avoid data shape mismatch

All 20 modules now pass tito test --all
2025-12-03 08:14:27 -08:00
..
2025-11-22 20:30:58 -05:00

Module 01: Tensor Foundation

Overview

Build the foundational Tensor class that powers all machine learning operations in TinyTorch.

Time Estimate

2-3 hours

Difficulty

☆☆☆ (Beginner)

Prerequisites

  • Python basics: Variables, functions, classes, operators
  • NumPy fundamentals: Array creation, indexing, basic operations
  • Linear algebra: Matrix multiplication concept, vectors vs matrices

Learning Outcomes

By completing this module, you will be able to:

  1. Implement a complete Tensor class with arithmetic operations (+, -, *, /), matrix multiplication, and shape manipulation that mirrors PyTorch's design patterns

  2. Understand tensor broadcasting semantics and how automatic shape alignment enables efficient batch processing across different dimensional data

  3. Design classes with dormant features that activate in future modules, learning PyTorch's evolution from Variable to unified Tensor with built-in autograd

  4. Analyze memory layout and cache behavior to understand why certain operations (row-wise access) are significantly faster than others (column-wise access)

  5. Build production-ready APIs with proper error handling, clear error messages, and input validation that guides users toward correct usage

Key Concepts

Tensors: The Universal ML Data Structure

Tensors are multi-dimensional arrays that serve as the fundamental data structure in machine learning:

  • 0D (scalar): Single number (e.g., loss value)
  • 1D (vector): List of numbers (e.g., bias terms)
  • 2D (matrix): Grid of numbers (e.g., weight matrices, images)
  • 3D+: Higher dimensions (e.g., batches of images, sequence data)

Broadcasting: Automatic Shape Alignment

NumPy-style broadcasting automatically aligns tensors of different shapes for operations:

matrix = [[1, 2], [3, 4]]  # Shape: (2, 2)
vector = [10, 20]           # Shape: (2,)
result = matrix + vector    # Broadcasting: (2,2) + (2,) → (2,2)
# Result: [[11, 22], [13, 24]]

Memory Layout and Cache Effects

Understanding row-major (C-style) storage explains why sequential access is faster:

  • Row-wise access: Sequential memory, excellent cache locality (~2-3× faster)
  • Column-wise access: Strided memory, poor cache locality
  • Real impact: Same O(n) algorithm, dramatically different wall-clock time

Dormant Gradient Features

Our Tensor includes gradient tracking attributes (requires_grad, grad, backward()) from the start, but they remain inactive until Module 05. This design:

  • Maintains consistent API throughout the course (no Variable vs Tensor confusion)
  • Follows PyTorch 2.0's unified Tensor design
  • Enables progressive disclosure of complexity

Module Structure

  1. Introduction: What is a Tensor? (Concept + ML context)
  2. Foundations: Mathematical Background (Broadcasting, memory layout)
  3. Implementation: Building Tensor class with immediate unit testing
  4. Integration: Neural network layer simulation
  5. Systems Analysis: Memory layout and cache performance
  6. Module Test: Comprehensive validation

What You'll Build

# Your complete Tensor class will support:
x = Tensor([[1, 2, 3], [4, 5, 6]])
y = Tensor([[7, 8, 9], [10, 11, 12]])

# Arithmetic operations with broadcasting
z = x + y              # Element-wise addition
scaled = x * 2         # Scalar broadcasting
normalized = (x - x.mean()) / x.std()  # Chaining operations

# Matrix operations
W = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
output = x.matmul(W)   # Matrix multiplication: (2,3) @ (3,2) → (2,2)

# Shape manipulation
reshaped = x.reshape(3, 2)     # (2,3) → (3,2)
transposed = x.transpose()     # (2,3) → (3,2) with data rearrangement

# Reduction operations
total = x.sum()                # Sum all elements
col_means = x.mean(axis=0)     # Average per column

Connection to Production ML

This module teaches patterns used in production frameworks:

  • PyTorch's Tensor class: Same API design with unified gradients
  • NumPy broadcasting: Industry-standard automatic shape alignment
  • Memory efficiency: Row-major storage, cache-aware algorithms
  • Error handling: Clear messages that guide users toward solutions

Files in This Module

  • tensor_dev.py: Your working implementation (Jupyter notebook format)
  • test_tensor.py: Comprehensive test suite (run with pytest)
  • README.md: This file

Next Steps

After completing this module:

→ Module 02: Activations

  • Build activation functions (ReLU, Sigmoid, GELU)
  • Learn how nonlinearity enables neural networks to learn complex patterns
  • Understand vanishing/exploding gradients through activation analysis

Your Tensor class becomes the foundation that all future modules build upon!

Common Pitfalls to Avoid

  1. Matrix multiplication vs element-wise multiplication

    • Use .matmul() or @ for matrix multiplication (dot product)
    • Use * for element-wise multiplication (Hadamard product)
  2. Shape compatibility in broadcasting

    • Inner dimensions must match for matmul: (M,K) @ (K,N) ✓
    • Broadcasting aligns from rightmost dimension
    • Clear error messages help debug shape mismatches
  3. Reshape vs transpose confusion

    • Reshape: Same memory layout, different interpretation (fast, O(1))
    • Transpose: Data rearrangement in memory (slower, O(n))
  4. Gradient features are dormant

    • requires_grad, grad, backward() exist but don't function yet
    • They activate in Module 05 - ignore them for now
    • Don't try to implement gradients manually

Resources

Getting Help

If you're stuck:

  1. Read the error messages carefully - they include hints
  2. Check the ASCII diagrams in tensor_dev.py for visual explanations
  3. Run unit tests individually to isolate issues
  4. Review the module integration test for end-to-end examples

Happy learning! 🚀