github-starred/TinyTorch

Fork 0

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-08 03:20:01 -05:00

Files

History

…

ABOUT.md

…

README.md

…

tensor.ipynb

…

tensor.py

…

README.md

Module 01: Tensor Foundation

Overview

Build the foundational Tensor class that powers all machine learning operations in TinyTorch.

Time Estimate

2-3 hours

Difficulty

⭐⭐☆☆☆ (Beginner)

Prerequisites

Python basics: Variables, functions, classes, operators
NumPy fundamentals: Array creation, indexing, basic operations
Linear algebra: Matrix multiplication concept, vectors vs matrices

Learning Outcomes

By completing this module, you will be able to:

Implement a complete Tensor class with arithmetic operations (+, -, *, /), matrix multiplication, and shape manipulation that mirrors PyTorch's design patterns
Understand tensor broadcasting semantics and how automatic shape alignment enables efficient batch processing across different dimensional data
Design classes with dormant features that activate in future modules, learning PyTorch's evolution from Variable to unified Tensor with built-in autograd
Analyze memory layout and cache behavior to understand why certain operations (row-wise access) are significantly faster than others (column-wise access)
Build production-ready APIs with proper error handling, clear error messages, and input validation that guides users toward correct usage

Key Concepts

Tensors: The Universal ML Data Structure

Tensors are multi-dimensional arrays that serve as the fundamental data structure in machine learning:

0D (scalar): Single number (e.g., loss value)
1D (vector): List of numbers (e.g., bias terms)
2D (matrix): Grid of numbers (e.g., weight matrices, images)
3D+: Higher dimensions (e.g., batches of images, sequence data)

Broadcasting: Automatic Shape Alignment

NumPy-style broadcasting automatically aligns tensors of different shapes for operations:

matrix = [[1, 2], [3, 4]]  # Shape: (2, 2)
vector = [10, 20]           # Shape: (2,)
result = matrix + vector    # Broadcasting: (2,2) + (2,) → (2,2)
# Result: [[11, 22], [13, 24]]

Memory Layout and Cache Effects

Understanding row-major (C-style) storage explains why sequential access is faster:

Row-wise access: Sequential memory, excellent cache locality (~2-3× faster)
Column-wise access: Strided memory, poor cache locality
Real impact: Same O(n) algorithm, dramatically different wall-clock time

Dormant Gradient Features

Our Tensor includes gradient tracking attributes (requires_grad, grad, backward()) from the start, but they remain inactive until Module 05. This design:

Maintains consistent API throughout the course (no Variable vs Tensor confusion)
Follows PyTorch 2.0's unified Tensor design
Enables progressive disclosure of complexity

Module Structure

Introduction: What is a Tensor? (Concept + ML context)
Foundations: Mathematical Background (Broadcasting, memory layout)
Implementation: Building Tensor class with immediate unit testing
Integration: Neural network layer simulation
Systems Analysis: Memory layout and cache performance
Module Test: Comprehensive validation

What You'll Build

# Your complete Tensor class will support:
x = Tensor([[1, 2, 3], [4, 5, 6]])
y = Tensor([[7, 8, 9], [10, 11, 12]])

# Arithmetic operations with broadcasting
z = x + y              # Element-wise addition
scaled = x * 2         # Scalar broadcasting
normalized = (x - x.mean()) / x.std()  # Chaining operations

# Matrix operations
W = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
output = x.matmul(W)   # Matrix multiplication: (2,3) @ (3,2) → (2,2)

# Shape manipulation
reshaped = x.reshape(3, 2)     # (2,3) → (3,2)
transposed = x.transpose()     # (2,3) → (3,2) with data rearrangement

# Reduction operations
total = x.sum()                # Sum all elements
col_means = x.mean(axis=0)     # Average per column

Connection to Production ML

This module teaches patterns used in production frameworks:

PyTorch's Tensor class: Same API design with unified gradients
NumPy broadcasting: Industry-standard automatic shape alignment
Memory efficiency: Row-major storage, cache-aware algorithms
Error handling: Clear messages that guide users toward solutions

Files in This Module

tensor_dev.py: Your working implementation (Jupyter notebook format)
test_tensor.py: Comprehensive test suite (run with pytest)
README.md: This file

Next Steps

After completing this module:

→ Module 02: Activations

Build activation functions (ReLU, Sigmoid, GELU)
Learn how nonlinearity enables neural networks to learn complex patterns
Understand vanishing/exploding gradients through activation analysis

Your Tensor class becomes the foundation that all future modules build upon!

Common Pitfalls to Avoid

Matrix multiplication vs element-wise multiplication
- Use .matmul() or @ for matrix multiplication (dot product)
- Use * for element-wise multiplication (Hadamard product)
Shape compatibility in broadcasting
- Inner dimensions must match for matmul: (M,K) @ (K,N) ✓
- Broadcasting aligns from rightmost dimension
- Clear error messages help debug shape mismatches
Reshape vs transpose confusion
- Reshape: Same memory layout, different interpretation (fast, O(1))
- Transpose: Data rearrangement in memory (slower, O(n))
Gradient features are dormant
- requires_grad, grad, backward() exist but don't function yet
- They activate in Module 05 - ignore them for now
- Don't try to implement gradients manually

Resources

NumPy documentation: https://numpy.org/doc/stable/
PyTorch Tensor API: https://pytorch.org/docs/stable/tensors.html
Broadcasting semantics: https://numpy.org/doc/stable/user/basics.broadcasting.html

Getting Help

If you're stuck:

Read the error messages carefully - they include hints
Check the ASCII diagrams in tensor_dev.py for visual explanations
Run unit tests individually to isolate issues
Review the module integration test for end-to-end examples

Happy learning! 🚀

README.md Unescape Escape

Module 01: Tensor Foundation

Overview

Time Estimate

Difficulty

Prerequisites

Learning Outcomes

Key Concepts

Tensors: The Universal ML Data Structure

Broadcasting: Automatic Shape Alignment

Memory Layout and Cache Effects

Dormant Gradient Features

Module Structure

What You'll Build

Connection to Production ML

Files in This Module

Next Steps

Common Pitfalls to Avoid

Resources

Getting Help

README.md