Enhance modules 01-04 with ASCII diagrams and improved flow

Following Module 05's successful visual learning patterns:
- Add ASCII diagrams for complex concepts
- Natural markdown flow explaining what's about to happen
- Visual memory layouts, data flows, and computation graphs
- Enhanced test sections with clear explanations
- Consistent with new MODULE_DEVELOPMENT guidelines

Module 01 (Tensor):
- Tensor dimension hierarchy visualization
- Memory layout and broadcasting diagrams
- Matrix multiplication step-by-step

Module 02 (Activations):
- Linearity problem and activation curves
- Dead neuron visualization for ReLU
- Softmax probability transformation

Module 03 (Layers):
- Linear layer computation visualization
- Parameter management hierarchy
- Batch processing shape transformations

Module 04 (Losses):
- Loss landscape visualizations
- MSE quadratic penalty diagrams
- CrossEntropy confidence patterns

All modules tested and working correctly
This commit is contained in:
Vijay Janapa Reddi
2025-09-29 13:49:08 -04:00
parent 0db744b371
commit 0ca2ab1efe
5 changed files with 737 additions and 67 deletions

View File

@@ -54,16 +54,38 @@ print("Ready to build tensors!")
# %% [markdown]
"""
## Understanding Tensors
## Understanding Tensors: From Numbers to Neural Networks
Tensors are N-dimensional arrays that store and manipulate numerical data. Think of them as generalizations of scalars, vectors, and matrices:
Tensors are N-dimensional arrays that store and manipulate numerical data. Think of them as containers for information that become increasingly powerful as dimensions increase.
- **Scalar (0D)**: A single number like `5.0`
- **Vector (1D)**: A list like `[1, 2, 3]` with shape `(3,)`
- **Matrix (2D)**: A 2D array like `[[1, 2], [3, 4]]` with shape `(2, 2)`
- **3D Tensor**: Like an RGB image with `(height, width, channels)`
### Tensor Dimension Hierarchy
Our Tensor class is a PURE data structure that wraps NumPy arrays with clean mathematical operations. This foundation focuses on data storage and computation - gradient tracking will be added in Module 05.
```
Scalar (0D) ──► Vector (1D) ──► Matrix (2D) ──► 3D+ Tensor
5.0 [1,2,3] [[1,2], [[[R,G,B]]]
[3,4]] image data
│ │ │ │
▼ ▼ ▼ ▼
Single List Table Multi-dimensional
number of numbers of numbers data structure
```
### Memory Layout: NumPy Array + Tensor Wrapper
Our Tensor class wraps NumPy's optimized arrays with clean ML operations:
```
TinyTorch Tensor NumPy Array
┌────────────────────────┐ ┌─────────────────────┐
│ Tensor Object │ ───► │ [1.0, 2.0, 3.0] │
│ • shape: (3,) │ │ • dtype: float32 │
│ • size: 3 │ │ • contiguous memory │
│ • operations: +,*,@ │ │ • BLAS optimized │
└────────────────────────┘ └─────────────────────┘
Clean ML API Fast Computation
```
This foundation focuses on pure data operations - gradient tracking comes in Module 05.
"""
# %% nbgrader={"grade": false, "grade_id": "tensor-init", "solution": true}
@@ -179,7 +201,24 @@ class Tensor:
"""
Addition operator: tensor + other
TODO: Implement + operator for tensors.
Element-wise addition with broadcasting support:
```
Tensor + Tensor: Tensor + Scalar:
[1, 2, 3] [1, 2, 3]
[4, 5, 6] + 5
──────── ────────
[5, 7, 9] [6, 7, 8]
```
TODO: Implement + operator using NumPy's vectorized operations
APPROACH:
1. Check if other is Tensor or scalar
2. Use NumPy broadcasting for element-wise addition
3. Return new Tensor with result
HINT: NumPy handles broadcasting automatically!
"""
### BEGIN SOLUTION
if isinstance(other, Tensor):
@@ -230,9 +269,35 @@ class Tensor:
def matmul(self, other: 'Tensor') -> 'Tensor':
"""
Matrix multiplication using NumPy's optimized implementation.
Matrix multiplication: combine two matrices through dot product operations.
TODO: Implement matrix multiplication.
### Matrix Multiplication Visualization
```
A (2×3) B (3×2) C (2×2)
┌─────────────┐ ┌───────┐ ┌─────────────┐
│ 1 2 3 │ │ 7 8 │ │ 1×7+2×9+3×1 │
│ │ │ 9 1 │ = │ │ = C
│ 4 5 6 │ │ 1 2 │ │ 4×7+5×9+6×1 │
└─────────────┘ └───────┘ └─────────────┘
│ │ │
▼ ▼ ▼
Each row of A × Each col of B = Element of C
```
### Computational Cost
**FLOPs**: 2 × M × N × K operations for (M×K) @ (K×N) matrix
**Memory**: Result size M×N, inputs stay unchanged
TODO: Implement matrix multiplication with shape validation
APPROACH:
1. Validate both tensors are 2D matrices
2. Check inner dimensions match: A(m,k) @ B(k,n) → C(m,n)
3. Use np.dot() for optimized BLAS computation
4. Return new Tensor with result
HINT: Let NumPy handle the heavy computation!
"""
### BEGIN SOLUTION
if len(self._data.shape) != 2 or len(other._data.shape) != 2:
@@ -423,6 +488,29 @@ test_unit_tensor_properties()
"""
### 🧪 Unit Test: Tensor Arithmetic
This test validates all arithmetic operations (+, -, *, /) work correctly.
**What we're testing**: Element-wise operations with broadcasting support
**Why it matters**: These operations form the foundation of neural network computations
**Expected**: All operations produce mathematically correct results with proper broadcasting
### Broadcasting Visualization
NumPy's broadcasting automatically handles different tensor shapes:
```
Same Shape: Broadcasting (vector + scalar):
[1, 2, 3] [1, 2, 3] [5] [1+5, 2+5, 3+5]
[4, 5, 6] + [4, 5, 6] + [5] = [4+5, 5+5, 6+5]
--------- --------- ───────────────
[5, 7, 9] [6, 7, 8] [9,10,11]
Matrix Broadcasting: Result:
┌─────────────┐ ┌─────────────┐
│ 1 2 3 │ │ 11 12 13 │
│ │ +10 │ │
│ 4 5 6 │ ──▶ │ 14 15 16 │
└─────────────┘ └─────────────┘
```
"""
# %%
@@ -469,6 +557,26 @@ test_unit_tensor_arithmetic()
"""
### 🧪 Unit Test: Matrix Multiplication
This test validates matrix multiplication and the @ operator.
**What we're testing**: Matrix multiplication with proper shape validation
**Why it matters**: Matrix multiplication is the core operation in neural networks
**Expected**: Correct results and informative errors for incompatible shapes
### Matrix Multiplication Process
For matrices A(2×2) @ B(2×2), each result element is computed as:
```
Computation Pattern:
C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] (row 0 of A × col 0 of B)
C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] (row 0 of A × col 1 of B)
C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] (row 1 of A × col 0 of B)
C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] (row 1 of A × col 1 of B)
Example:
[[1, 2]] @ [[5, 6]] = [[1*5+2*7, 1*6+2*8]] = [[19, 22]]
[[3, 4]] [[7, 8]] [[3*5+4*7, 3*6+4*8]] [[43, 50]]
```
"""
# %%
@@ -506,6 +614,33 @@ test_unit_matrix_multiplication()
"""
### 🧪 Unit Test: Tensor Operations
This test validates reshape, transpose, and numpy conversion.
**What we're testing**: Shape manipulation operations that reorganize data
**Why it matters**: Neural networks constantly reshape data between layers
**Expected**: Same data, different organization (no copying for most operations)
### Shape Manipulation Visualization
```
Original tensor (2×3):
┌─────────────┐
│ 1 2 3 │
│ │
│ 4 5 6 │
└─────────────┘
Reshape to (3×2): Transpose to (3×2):
┌─────────┐ ┌─────────┐
│ 1 2 │ │ 1 4 │
│ 3 4 │ │ 2 5 │
│ 5 6 │ │ 3 6 │
└─────────┘ └─────────┘
Memory Impact:
- Reshape: Usually creates VIEW (no copy, just new indexing)
- Transpose: Creates VIEW (no copy, just swapped strides)
- Indexing: May create COPY (depends on pattern)
```
"""
# %%
@@ -579,55 +714,105 @@ test_module()
# %% [markdown]
"""
## Basic Performance Check
## Systems Analysis: Memory Layout and Performance
Let's do a simple check to see how our tensor operations perform:
Now that our Tensor is working, let's understand how it behaves at the systems level. This analysis shows you how tensor operations scale and where bottlenecks appear in real ML systems.
### Memory Usage Patterns
```
Operation Type Memory Pattern When to Worry
──────────────────────────────────────────────────────────────
Element-wise (+,*,/) 2× input size Large tensor ops
Matrix multiply (@) Size(A) + Size(B) + Size(C) GPU memory limits
Reshape/transpose Same memory, new view Never (just metadata)
Indexing/slicing Copy vs view Depends on pattern
```
### Performance Characteristics
Let's measure how our tensor operations scale with size:
"""
# %%
def check_tensor_performance():
"""Simple performance check for our tensor operations."""
print("📊 Basic Performance Check:")
def analyze_tensor_performance():
"""Analyze tensor operations performance and memory usage."""
print("📊 Systems Analysis: Tensor Performance\n")
import time
import sys
# Test with small matrices first
a = Tensor.random(100, 100)
b = Tensor.random(100, 100)
# Test different matrix sizes to understand scaling
sizes = [50, 100, 200, 400]
results = []
start = time.perf_counter()
result = a @ b
elapsed = time.perf_counter() - start
for size in sizes:
print(f"Testing {size}×{size} matrices...")
a = Tensor.random(size, size)
b = Tensor.random(size, size)
print(f"100x100 matrix multiplication: {elapsed*1000:.2f}ms")
print(f"Result shape: {result.shape}")
print("✅ Tensor operations work efficiently!")
# Measure matrix multiplication time
start = time.perf_counter()
result = a @ b
elapsed = time.perf_counter() - start
# Calculate memory usage (rough estimate)
memory_mb = (a.size + b.size + result.size) * 4 / (1024 * 1024) # 4 bytes per float32
flops = 2 * size * size * size # 2*N³ for matrix multiplication
gflops = flops / (elapsed * 1e9)
results.append((size, elapsed * 1000, memory_mb, gflops))
print(f" Time: {elapsed*1000:.2f}ms, Memory: ~{memory_mb:.1f}MB, Performance: {gflops:.2f} GFLOPS")
print("\n🔍 Performance Analysis:")
print("```")
print("Size Time(ms) Memory(MB) Performance(GFLOPS)")
print("-" * 50)
for size, time_ms, mem_mb, gflops in results:
print(f"{size:4d} {time_ms:7.2f} {mem_mb:9.1f} {gflops:15.2f}")
print("```")
print("\n💡 Key Insights:")
print("• Matrix multiplication is O(N³) - doubling size = 8× more computation")
print("• Memory grows as O(N²) - usually not the bottleneck for single operations")
print("• NumPy uses optimized BLAS libraries (like OpenBLAS, Intel MKL)")
print("• Performance depends heavily on your CPU and available memory bandwidth")
return results
if __name__ == "__main__":
print("🚀 Running Tensor module...")
test_module()
print("✅ Module validation complete!")
print("\n📊 Running systems analysis...")
analyze_tensor_performance()
print("\n✅ Module validation complete!")
# %% [markdown]
"""
## 🤔 ML Systems Thinking: Interactive Questions
### Question 1: Tensor Size and Memory
**Context**: Your Tensor class stores data as NumPy arrays. When you created different sized tensors, you saw how memory usage changes.
### Question 1: Memory Scaling and Neural Network Implications
**Context**: Your performance analysis showed how tensor memory usage scales with size. A 1000×1000 tensor uses 100× more memory than a 100×100 tensor.
**Reflection Question**: If you create a 1000×1000 tensor versus a 100×100 tensor, how does memory usage change? Why does this matter for neural networks with millions of parameters?
**Systems Question**: Modern language models have weight matrices of size [4096, 11008] (Llama-2 7B). How much memory would this single layer consume in float32? Why do production systems use float16 or int8 quantization?
### Question 2: Operation Performance
**Context**: Your arithmetic operators (+, -, *, /) use NumPy's vectorized operations instead of Python loops.
*Calculate*: 4096 × 11008 × 4 bytes = ? GB per layer
**Reflection Question**: Why is `tensor1 + tensor2` much faster than looping through each element? How does this speed advantage become critical in neural network training?
### Question 2: Computational Complexity in Practice
**Context**: Your analysis revealed O(N³) scaling for matrix multiplication. This means doubling the matrix size increases computation time by 8×.
### Question 3: Matrix Multiplication Scaling
**Context**: Your `matmul()` method uses NumPy's optimized `np.dot()` function for matrix multiplication.
**Performance Question**: If a 400×400 matrix multiplication takes 100ms on your machine, how long would a 1600×1600 multiplication take? How does this explain why training large neural networks requires GPUs with thousands of cores?
**Reflection Question**: Matrix multiplication has O(N³) complexity. If you double the matrix size, how much longer does multiplication take? When does this become a bottleneck in neural networks?
*Think*: 1600 = 4 × 400, so computation = 4³ = 64× longer
### Question 3: Memory Bandwidth vs Compute Power
**Context**: Your Tensor operations are limited by how fast data moves between RAM and CPU, not just raw computational power.
**Architecture Question**: Why might element-wise operations (like tensor + tensor) be slower per operation than matrix multiplication, even though addition is simpler than dot products? How do modern ML accelerators (GPUs, TPUs) address this?
*Hint*: Consider the ratio of data movement to computation work
"""
@@ -638,11 +823,12 @@ if __name__ == "__main__":
Congratulations! You've built the fundamental data structure that powers neural networks.
### What You've Accomplished
✅ **Core Tensor Class**: Complete implementation with creation, properties, and operations
✅ **Essential Arithmetic**: Addition, subtraction, multiplication, division with NumPy integration
✅ **Matrix Operations**: Matrix multiplication with @ operator and shape validation
✅ **Shape Manipulation**: Reshape and transpose for data transformation
✅ **Testing Framework**: Comprehensive unit tests validating all functionality
✅ **Core Tensor Class**: Complete N-dimensional array implementation wrapping NumPy's optimized operations
✅ **Broadcasting Arithmetic**: Element-wise operations (+, -, *, /) with automatic shape handling
✅ **Matrix Operations**: O(N³) matrix multiplication with @ operator and comprehensive shape validation
✅ **Memory-Efficient Shape Manipulation**: Reshape and transpose operations using views when possible
✅ **Systems Analysis**: Performance profiling revealing scaling characteristics and memory patterns
✅ **Production-Ready Testing**: Unit tests with immediate validation and clear error messages
### Key Learning Outcomes
- **Tensor Fundamentals**: N-dimensional arrays as the foundation of ML