mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-05 02:53:58 -05:00
✅ Rename all module directories: 00_setup → 01_setup, etc. ✅ Update convert_modules.py mappings for new directory names ✅ Update _toc.yml file paths and titles (1-14 instead of 0-13) ✅ Regenerate all overview pages with new numbering ✅ Fix all broken references in usage-paths and intro ✅ Update chapter references to use natural numbering Benefits: - More intuitive course progression starting from 1 - Matches academic course numbering conventions - Eliminates confusion about 'Module 0' concept - Cleaner mental model for students and instructors - All references and links properly updated Complete transformation: 14 modules now numbered 01-14
252 lines
7.5 KiB
Markdown
252 lines
7.5 KiB
Markdown
# 🔥 TinyTorch Activations Module
|
|
|
|
## 📊 Module Info
|
|
- **Difficulty**: ⭐⭐ Intermediate
|
|
- **Time Estimate**: 3-4 hours
|
|
- **Prerequisites**: Tensor module
|
|
- **Next Steps**: Layers module
|
|
|
|
Welcome to the **Activations** module! This is where you'll implement the mathematical functions that give neural networks their power to learn complex patterns.
|
|
|
|
## 🎯 Learning Objectives
|
|
|
|
By the end of this module, you will:
|
|
1. **Understand** why activation functions are essential for neural networks
|
|
2. **Implement** the three most important activation functions: ReLU, Sigmoid, and Tanh
|
|
3. **Test** your functions with various inputs to understand their behavior
|
|
4. **Grasp** the mathematical properties that make each function useful
|
|
|
|
## 🧠 Why This Module Matters
|
|
|
|
**Without activation functions, neural networks are just linear transformations!**
|
|
|
|
```
|
|
Linear → Linear → Linear = Still just Linear
|
|
Linear → Activation → Linear = Can learn complex patterns!
|
|
```
|
|
|
|
This module teaches you the mathematical foundations that make deep learning possible.
|
|
|
|
## 📚 What You'll Build
|
|
|
|
### 1. **ReLU** (Rectified Linear Unit)
|
|
- **Formula**: `f(x) = max(0, x)`
|
|
- **Properties**: Simple, sparse, unbounded
|
|
- **Use case**: Hidden layers (most common)
|
|
|
|
### 2. **Sigmoid**
|
|
- **Formula**: `f(x) = 1 / (1 + e^(-x))`
|
|
- **Properties**: Bounded to (0,1), smooth, probabilistic
|
|
- **Use case**: Binary classification, gates
|
|
|
|
### 3. **Tanh** (Hyperbolic Tangent)
|
|
- **Formula**: `f(x) = tanh(x)`
|
|
- **Properties**: Bounded to (-1,1), zero-centered, smooth
|
|
- **Use case**: Hidden layers, RNNs
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
1. **Activate the virtual environment**:
|
|
```bash
|
|
source bin/activate-tinytorch.sh
|
|
```
|
|
|
|
2. **Start development environment**:
|
|
```bash
|
|
tito jupyter
|
|
```
|
|
|
|
### Development Workflow
|
|
|
|
1. **Open the development file**:
|
|
```bash
|
|
# Then open assignments/source/02_activations/activations_dev.py
|
|
```
|
|
|
|
2. **Implement the functions**:
|
|
- Start with ReLU (simplest)
|
|
- Move to Sigmoid (numerical stability challenge)
|
|
- Finish with Tanh (symmetry properties)
|
|
|
|
3. **Visualize your functions**:
|
|
- Each function has plotting sections
|
|
- See how your implementation transforms inputs
|
|
- Compare all functions side-by-side
|
|
|
|
4. **Test as you go**:
|
|
```bash
|
|
tito test --module activations
|
|
```
|
|
|
|
5. **Export to package**:
|
|
```bash
|
|
tito sync
|
|
```
|
|
|
|
### 📊 Visual Learning Features
|
|
|
|
This module includes comprehensive plotting sections to help you understand:
|
|
|
|
- **Individual Function Plots**: See each activation function's curve
|
|
- **Implementation Comparison**: Your implementation vs ideal side-by-side
|
|
- **Mathematical Explanations**: Visual breakdown of function properties
|
|
- **Error Analysis**: Quantitative feedback on implementation accuracy
|
|
- **Comprehensive Comparison**: All functions analyzed together
|
|
|
|
**Enhanced Features**:
|
|
- **4-Panel Plots**: Implementation vs ideal, mathematical definition, properties, error analysis
|
|
- **Real-time Feedback**: Immediate accuracy scores with color-coded status
|
|
- **Mathematical Insights**: Detailed explanations of function properties
|
|
- **Numerical Stability Testing**: Verification with extreme values
|
|
- **Property Verification**: Symmetry, monotonicity, and zero-centering tests
|
|
|
|
**Why enhanced plots matter**:
|
|
- **Visual Debugging**: See exactly where your implementation differs
|
|
- **Quantitative Feedback**: Get precise error measurements
|
|
- **Mathematical Understanding**: Connect formulas to visual behavior
|
|
- **Implementation Confidence**: Know immediately if your code is correct
|
|
- **Learning Reinforcement**: Multiple visual perspectives of the same concept
|
|
|
|
### Implementation Tips
|
|
|
|
#### ReLU Implementation
|
|
```python
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
return Tensor(np.maximum(0, x.data))
|
|
```
|
|
|
|
#### Sigmoid Implementation (Numerical Stability)
|
|
```python
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
# For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
|
|
# For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
|
|
x_data = x.data
|
|
result = np.zeros_like(x_data)
|
|
|
|
positive_mask = x_data >= 0
|
|
result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
|
|
result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
|
|
|
|
return Tensor(result)
|
|
```
|
|
|
|
#### Tanh Implementation
|
|
```python
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
return Tensor(np.tanh(x.data))
|
|
```
|
|
|
|
### Testing Your Implementation
|
|
|
|
1. **Run the tests**:
|
|
```bash
|
|
tito test --module activations
|
|
```
|
|
|
|
2. **Export to package**:
|
|
```bash
|
|
tito sync
|
|
```
|
|
|
|
### Manual Testing
|
|
```python
|
|
# Test all activations
|
|
from tinytorch.core.tensor import Tensor
|
|
from modules.activations.activations_dev import ReLU, Sigmoid, Tanh
|
|
|
|
x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])
|
|
|
|
relu = ReLU()
|
|
sigmoid = Sigmoid()
|
|
tanh = Tanh()
|
|
|
|
print("Input:", x.data)
|
|
print("ReLU:", relu(x).data)
|
|
print("Sigmoid:", sigmoid(x).data)
|
|
print("Tanh:", tanh(x).data)
|
|
```
|
|
|
|
## 📊 Understanding Function Properties
|
|
|
|
### Range Comparison
|
|
| Function | Input Range | Output Range | Zero Point |
|
|
|----------|-------------|--------------|------------|
|
|
| ReLU | (-∞, ∞) | [0, ∞) | f(0) = 0 |
|
|
| Sigmoid | (-∞, ∞) | (0, 1) | f(0) = 0.5 |
|
|
| Tanh | (-∞, ∞) | (-1, 1) | f(0) = 0 |
|
|
|
|
### Key Properties
|
|
- **ReLU**: Sparse (zeros out negatives), unbounded, simple
|
|
- **Sigmoid**: Probabilistic (0-1 range), smooth, saturating
|
|
- **Tanh**: Zero-centered, symmetric, stronger gradients than sigmoid
|
|
|
|
## 🔧 Integration with TinyTorch
|
|
|
|
After implementation, your activations will be available as:
|
|
|
|
```python
|
|
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
|
|
|
|
# Use in neural networks
|
|
relu = ReLU()
|
|
output = relu(input_tensor)
|
|
```
|
|
|
|
## 🎯 Common Issues & Solutions
|
|
|
|
### Issue 1: Sigmoid Overflow
|
|
**Problem**: `exp()` overflow with large inputs
|
|
**Solution**: Use numerically stable implementation (see code above)
|
|
|
|
### Issue 2: Wrong Output Range
|
|
**Problem**: Sigmoid/Tanh outputs outside expected range
|
|
**Solution**: Check your mathematical implementation
|
|
|
|
### Issue 3: Shape Mismatch
|
|
**Problem**: Output shape differs from input shape
|
|
**Solution**: Ensure element-wise operations preserve shape
|
|
|
|
### Issue 4: Import Errors
|
|
**Problem**: Cannot import after implementation
|
|
**Solution**: Run `tito sync` to export to package
|
|
|
|
## 📈 Performance Considerations
|
|
|
|
- **ReLU**: Fastest (simple max operation)
|
|
- **Sigmoid**: Moderate (exponential computation)
|
|
- **Tanh**: Moderate (hyperbolic function)
|
|
|
|
All implementations use NumPy for vectorized operations.
|
|
|
|
## 🚀 What's Next
|
|
|
|
After mastering activations, you'll use them in:
|
|
1. **Layers Module**: Building neural network layers
|
|
2. **Loss Functions**: Computing training objectives
|
|
3. **Advanced Architectures**: CNNs, RNNs, and more
|
|
|
|
These functions are the mathematical foundation for everything that follows!
|
|
|
|
## 📚 Further Reading
|
|
|
|
**Mathematical Background**:
|
|
- [Activation Functions in Neural Networks](https://en.wikipedia.org/wiki/Activation_function)
|
|
- [Deep Learning Book - Chapter 6](http://www.deeplearningbook.org/)
|
|
|
|
**Advanced Topics**:
|
|
- ReLU variants (Leaky ReLU, ELU, Swish)
|
|
- Activation function choice and impact
|
|
- Gradient flow and vanishing gradients
|
|
|
|
## 🎉 Success Criteria
|
|
|
|
You've mastered this module when:
|
|
- [ ] All tests pass (`tito test --module activations`)
|
|
- [ ] You understand why each function is useful
|
|
- [ ] You can explain the mathematical properties
|
|
- [ ] You can use activations in neural networks
|
|
- [ ] You appreciate the importance of nonlinearity
|
|
|
|
**Great work! You've built the mathematical foundation of neural networks!** 🎉 |