mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-05 04:27:31 -05:00
Added all module development files to modules/XX_name/ directories:
Module notebooks and scripts:
- 18 modules with .ipynb and .py files (01-20, excluding some gaps)
- Moved from modules/source/ to direct module directories
- Includes tensor, autograd, layers, transformers, optimization modules
Module README files:
- Added README.md for modules with additional documentation
- Complements ABOUT.md files added earlier
This completes the module restructuring:
- Before: modules/source/XX_name/*_dev.{py,ipynb}
- After: modules/XX_name/*_dev.{py,ipynb}
All development happens directly in numbered module directories now.
155 lines
6.0 KiB
Markdown
155 lines
6.0 KiB
Markdown
# Module 01: Tensor Foundation
|
||
|
||
## Overview
|
||
Build the foundational Tensor class that powers all machine learning operations in TinyTorch.
|
||
|
||
## Time Estimate
|
||
**2-3 hours**
|
||
|
||
## Difficulty
|
||
⭐⭐☆☆☆ (Beginner)
|
||
|
||
## Prerequisites
|
||
- **Python basics**: Variables, functions, classes, operators
|
||
- **NumPy fundamentals**: Array creation, indexing, basic operations
|
||
- **Linear algebra**: Matrix multiplication concept, vectors vs matrices
|
||
|
||
## Learning Outcomes
|
||
|
||
By completing this module, you will be able to:
|
||
|
||
1. **Implement a complete Tensor class** with arithmetic operations (+, -, *, /), matrix multiplication, and shape manipulation that mirrors PyTorch's design patterns
|
||
|
||
2. **Understand tensor broadcasting semantics** and how automatic shape alignment enables efficient batch processing across different dimensional data
|
||
|
||
3. **Design classes with dormant features** that activate in future modules, learning PyTorch's evolution from Variable to unified Tensor with built-in autograd
|
||
|
||
4. **Analyze memory layout and cache behavior** to understand why certain operations (row-wise access) are significantly faster than others (column-wise access)
|
||
|
||
5. **Build production-ready APIs** with proper error handling, clear error messages, and input validation that guides users toward correct usage
|
||
|
||
## Key Concepts
|
||
|
||
### Tensors: The Universal ML Data Structure
|
||
Tensors are multi-dimensional arrays that serve as the fundamental data structure in machine learning:
|
||
- **0D (scalar)**: Single number (e.g., loss value)
|
||
- **1D (vector)**: List of numbers (e.g., bias terms)
|
||
- **2D (matrix)**: Grid of numbers (e.g., weight matrices, images)
|
||
- **3D+**: Higher dimensions (e.g., batches of images, sequence data)
|
||
|
||
### Broadcasting: Automatic Shape Alignment
|
||
NumPy-style broadcasting automatically aligns tensors of different shapes for operations:
|
||
```python
|
||
matrix = [[1, 2], [3, 4]] # Shape: (2, 2)
|
||
vector = [10, 20] # Shape: (2,)
|
||
result = matrix + vector # Broadcasting: (2,2) + (2,) → (2,2)
|
||
# Result: [[11, 22], [13, 24]]
|
||
```
|
||
|
||
### Memory Layout and Cache Effects
|
||
Understanding row-major (C-style) storage explains why sequential access is faster:
|
||
- **Row-wise access**: Sequential memory, excellent cache locality (~2-3× faster)
|
||
- **Column-wise access**: Strided memory, poor cache locality
|
||
- **Real impact**: Same O(n) algorithm, dramatically different wall-clock time
|
||
|
||
### Dormant Gradient Features
|
||
Our Tensor includes gradient tracking attributes (`requires_grad`, `grad`, `backward()`) from the start, but they remain inactive until Module 05. This design:
|
||
- Maintains consistent API throughout the course (no Variable vs Tensor confusion)
|
||
- Follows PyTorch 2.0's unified Tensor design
|
||
- Enables progressive disclosure of complexity
|
||
|
||
## Module Structure
|
||
|
||
1. **Introduction**: What is a Tensor? (Concept + ML context)
|
||
2. **Foundations**: Mathematical Background (Broadcasting, memory layout)
|
||
3. **Implementation**: Building Tensor class with immediate unit testing
|
||
4. **Integration**: Neural network layer simulation
|
||
5. **Systems Analysis**: Memory layout and cache performance
|
||
6. **Module Test**: Comprehensive validation
|
||
|
||
## What You'll Build
|
||
|
||
```python
|
||
# Your complete Tensor class will support:
|
||
x = Tensor([[1, 2, 3], [4, 5, 6]])
|
||
y = Tensor([[7, 8, 9], [10, 11, 12]])
|
||
|
||
# Arithmetic operations with broadcasting
|
||
z = x + y # Element-wise addition
|
||
scaled = x * 2 # Scalar broadcasting
|
||
normalized = (x - x.mean()) / x.std() # Chaining operations
|
||
|
||
# Matrix operations
|
||
W = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
|
||
output = x.matmul(W) # Matrix multiplication: (2,3) @ (3,2) → (2,2)
|
||
|
||
# Shape manipulation
|
||
reshaped = x.reshape(3, 2) # (2,3) → (3,2)
|
||
transposed = x.transpose() # (2,3) → (3,2) with data rearrangement
|
||
|
||
# Reduction operations
|
||
total = x.sum() # Sum all elements
|
||
col_means = x.mean(axis=0) # Average per column
|
||
```
|
||
|
||
## Connection to Production ML
|
||
|
||
This module teaches patterns used in production frameworks:
|
||
- **PyTorch's Tensor class**: Same API design with unified gradients
|
||
- **NumPy broadcasting**: Industry-standard automatic shape alignment
|
||
- **Memory efficiency**: Row-major storage, cache-aware algorithms
|
||
- **Error handling**: Clear messages that guide users toward solutions
|
||
|
||
## Files in This Module
|
||
|
||
- `tensor_dev.py`: Your working implementation (Jupyter notebook format)
|
||
- `test_tensor.py`: Comprehensive test suite (run with pytest)
|
||
- `README.md`: This file
|
||
|
||
## Next Steps
|
||
|
||
After completing this module:
|
||
|
||
**→ Module 02: Activations**
|
||
- Build activation functions (ReLU, Sigmoid, GELU)
|
||
- Learn how nonlinearity enables neural networks to learn complex patterns
|
||
- Understand vanishing/exploding gradients through activation analysis
|
||
|
||
Your Tensor class becomes the foundation that all future modules build upon!
|
||
|
||
## Common Pitfalls to Avoid
|
||
|
||
1. **Matrix multiplication vs element-wise multiplication**
|
||
- Use `.matmul()` or `@` for matrix multiplication (dot product)
|
||
- Use `*` for element-wise multiplication (Hadamard product)
|
||
|
||
2. **Shape compatibility in broadcasting**
|
||
- Inner dimensions must match for matmul: (M,K) @ (K,N) ✓
|
||
- Broadcasting aligns from rightmost dimension
|
||
- Clear error messages help debug shape mismatches
|
||
|
||
3. **Reshape vs transpose confusion**
|
||
- Reshape: Same memory layout, different interpretation (fast, O(1))
|
||
- Transpose: Data rearrangement in memory (slower, O(n))
|
||
|
||
4. **Gradient features are dormant**
|
||
- `requires_grad`, `grad`, `backward()` exist but don't function yet
|
||
- They activate in Module 05 - ignore them for now
|
||
- Don't try to implement gradients manually
|
||
|
||
## Resources
|
||
|
||
- **NumPy documentation**: https://numpy.org/doc/stable/
|
||
- **PyTorch Tensor API**: https://pytorch.org/docs/stable/tensors.html
|
||
- **Broadcasting semantics**: https://numpy.org/doc/stable/user/basics.broadcasting.html
|
||
|
||
## Getting Help
|
||
|
||
If you're stuck:
|
||
1. Read the error messages carefully - they include hints
|
||
2. Check the ASCII diagrams in `tensor_dev.py` for visual explanations
|
||
3. Run unit tests individually to isolate issues
|
||
4. Review the module integration test for end-to-end examples
|
||
|
||
Happy learning! 🚀
|