feat: Update mathematical equations to use proper LaTeX formatting

- Updated autograd module: chain rule, partial derivatives, gradient rules - Updated activations module: ReLU, sigmoid, tanh, softmax formulas - Updated layers module: linear transformation, matrix multiplication - Updated networks module: function composition formulas All mathematical equations now use LaTeX formatting ($...$ and 9983...9983) for better rendering in Jupyter notebooks and documentation.
2026-04-28 06:49:18 -05:00 · 2025-07-13 15:20:53 -04:00
parent 0eab3c2de3
commit 7f1a038ce7
4 changed files with 36 additions and 47 deletions
--- a/modules/source/02_activations/activations_dev.py
+++ b/modules/source/02_activations/activations_dev.py
@@ -305,25 +305,25 @@ This theorem guarantees that neural networks with nonlinear activations can lear
 ### The Four Fundamental Activation Functions

 #### **1. ReLU (Rectified Linear Unit)**
- **Formula**: f(x) = max(0, x)
+- **Formula**: $f(x) = \max(0, x)$
 - **Use case**: Hidden layers in most networks
 - **Advantages**: Simple, fast, no vanishing gradients
 - **Disadvantages**: "Dead neurons" problem

 #### **2. Sigmoid**
- **Formula**: f(x) = 1/(1 + e^(-x))
+- **Formula**: $f(x) = \frac{1}{1 + e^{-x}}$
 - **Use case**: Binary classification output
 - **Advantages**: Smooth, probabilistic interpretation
 - **Disadvantages**: Vanishing gradients, computationally expensive

 #### **3. Tanh (Hyperbolic Tangent)**
- **Formula**: f(x) = (e^x - e^(-x))/(e^x + e^(-x))
+- **Formula**: $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
 - **Use case**: Hidden layers (better than sigmoid)
 - **Advantages**: Zero-centered, stronger gradients than sigmoid
 - **Disadvantages**: Still suffers from vanishing gradients

 #### **4. Softmax**
- **Formula**: f(x_i) = e^(x_i) / Σ(e^(x_j))
+- **Formula**: $f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$
 - **Use case**: Multi-class classification output
 - **Advantages**: Probabilistic, sums to 1
 - **Disadvantages**: Computationally expensive, can saturate
--- a/modules/source/03_layers/layers_dev.py
+++ b/modules/source/03_layers/layers_dev.py
@@ -99,10 +99,8 @@ from tinytorch.core.activations import ReLU, Sigmoid  # Nonlinearity
 ### Linear Algebra at the Heart of ML
 Neural networks are fundamentally about **linear transformations** followed by **nonlinear activations**:

-```
-Layer: y = Wx + b (linear transformation)
-Activation: z = σ(y) (nonlinear transformation)
-```
+$$\text{Layer: } y = Wx + b \text{ (linear transformation)}$$
+$$\text{Activation: } z = \sigma(y) \text{ (nonlinear transformation)}$$

 ### Matrix Multiplication: The Engine of Deep Learning
 Every forward pass in a neural network involves matrix multiplication:
@@ -138,11 +136,9 @@ Every framework optimizes matrix multiplication:
 ### What is Matrix Multiplication?
 Matrix multiplication is the **fundamental operation** that powers neural networks. When we multiply matrices A and B:

-```
-C = A @ B
-```
+$$C = A \times B$$

-Each element C[i,j] is the **dot product** of row i from A and column j from B.
+Each element $C_{i,j}$ is the **dot product** of row $i$ from A and column $j$ from B.

 ### The Mathematical Foundation: Linear Algebra in Neural Networks

--- a/modules/source/04_networks/networks_dev.py
+++ b/modules/source/04_networks/networks_dev.py
@@ -106,9 +106,7 @@ from tinytorch.core.tensor import Tensor  # Foundation
 ### Function Composition at Scale
 Neural networks are fundamentally about **function composition**:

-```
-f(x) = f_n(f_{n-1}(...f_2(f_1(x))))
-```
+$$f(x) = f_n(f_{n-1}(\ldots f_2(f_1(x)) \ldots))$$

 Each layer is a function, and the network is the composition of all these functions.

@@ -155,15 +153,10 @@ Input → Layer1 → Layer2 → Layer3 → Output
 #### **Function Composition in Mathematics**
 In mathematics, function composition combines simple functions to create complex ones:

-```python
-# Mathematical composition: (f ∘ g)(x) = f(g(x))
-def compose(f, g):
-    return lambda x: f(g(x))
+$$(f \circ g)(x) = f(g(x))$$

-# Neural network composition: h(x) = f_n(f_{n-1}(...f_2(f_1(x))))
-def network(layers):
-    return lambda x: reduce(lambda acc, layer: layer(acc), layers, x)
-```
+Neural network composition:
+$$h(x) = f_n(f_{n-1}(\ldots f_2(f_1(x)) \ldots))$$

 #### **Why Composition is Powerful**
 1. **Modularity**: Each layer has a specific, well-defined purpose
--- a/modules/source/07_autograd/autograd_dev.py
+++ b/modules/source/07_autograd/autograd_dev.py
@@ -131,35 +131,35 @@ output = add_result * sub_result = 5
 #### **Backward Pass: Computing Gradients**
 Traverse the graph from outputs to inputs, computing gradients using the chain rule:

-```python
-# Backward pass for f(x, y) = (x + y) * (x - y)
-# Starting from output gradient = 1
-∂output/∂multiply = 1
-∂output/∂add = ∂output/∂multiply * ∂multiply/∂add = 1 * sub_result = 1
-∂output/∂sub = ∂output/∂multiply * ∂multiply/∂sub = 1 * add_result = 5
-∂output/∂x = ∂output/∂add * ∂add/∂x + ∂output/∂sub * ∂sub/∂x = 1 * 1 + 5 * 1 = 6
-∂output/∂y = ∂output/∂add * ∂add/∂y + ∂output/∂sub * ∂sub/∂y = 1 * 1 + 5 * (-1) = -4
+For $f(x, y) = (x + y) \cdot (x - y)$ with $x = 3, y = 2$:
+
+$$\frac{\partial \text{output}}{\partial \text{multiply}} = 1$$
+
+$$\frac{\partial \text{output}}{\partial \text{add}} = \frac{\partial \text{output}}{\partial \text{multiply}} \cdot \frac{\partial \text{multiply}}{\partial \text{add}} = 1 \cdot \text{sub\_result} = 1$$
+
+$$\frac{\partial \text{output}}{\partial \text{sub}} = \frac{\partial \text{output}}{\partial \text{multiply}} \cdot \frac{\partial \text{multiply}}{\partial \text{sub}} = 1 \cdot \text{add\_result} = 5$$
+
+$$\frac{\partial \text{output}}{\partial x} = \frac{\partial \text{output}}{\partial \text{add}} \cdot \frac{\partial \text{add}}{\partial x} + \frac{\partial \text{output}}{\partial \text{sub}} \cdot \frac{\partial \text{sub}}{\partial x} = 1 \cdot 1 + 5 \cdot 1 = 6$$
+
+$$\frac{\partial \text{output}}{\partial y} = \frac{\partial \text{output}}{\partial \text{add}} \cdot \frac{\partial \text{add}}{\partial y} + \frac{\partial \text{output}}{\partial \text{sub}} \cdot \frac{\partial \text{sub}}{\partial y} = 1 \cdot 1 + 5 \cdot (-1) = -4$$
 ```

 ### Mathematical Foundation: The Chain Rule

 #### **Single Variable Chain Rule**
-For composite functions: If z = f(g(x)), then:
-```
-dz/dx = (dz/df) * (df/dx)
-```
+For composite functions: If $z = f(g(x))$, then:
+
+$$\frac{dz}{dx} = \frac{dz}{df} \cdot \frac{df}{dx}$$

 #### **Multivariable Chain Rule**
-For functions of multiple variables: If z = f(x, y) where x = g(t) and y = h(t), then:
-```
-dz/dt = (∂z/∂x) * (dx/dt) + (∂z/∂y) * (dy/dt)
-```
+For functions of multiple variables: If $z = f(x, y)$ where $x = g(t)$ and $y = h(t)$, then:
+
+$$\frac{dz}{dt} = \frac{\partial z}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial z}{\partial y} \cdot \frac{dy}{dt}$$

 #### **Chain Rule in Computational Graphs**
 For any path from input to output through intermediate nodes:
-```
-∂output/∂input = ∏(∂node_{i+1}/∂node_i) for all nodes in the path
-```
+
+$$\frac{\partial \text{output}}{\partial \text{input}} = \prod_{i} \frac{\partial \text{node}_{i+1}}{\partial \text{node}_i}$$

 ### Automatic Differentiation Modes

@@ -472,10 +472,10 @@ Every differentiable operation follows the same pattern:
 3. **Return Variable**: With the result and grad_fn

 ### Mathematical Rules
- **Addition**: `d(x + y)/dx = 1, d(x + y)/dy = 1`
- **Multiplication**: `d(x * y)/dx = y, d(x * y)/dy = x`
- **Subtraction**: `d(x - y)/dx = 1, d(x - y)/dy = -1`
- **Division**: `d(x / y)/dx = 1/y, d(x / y)/dy = -x/y²`
+- **Addition**: $\frac{d(x + y)}{dx} = 1$, $\frac{d(x + y)}{dy} = 1$
+- **Multiplication**: $\frac{d(x \cdot y)}{dx} = y$, $\frac{d(x \cdot y)}{dy} = x$
+- **Subtraction**: $\frac{d(x - y)}{dx} = 1$, $\frac{d(x - y)}{dy} = -1$
+- **Division**: $\frac{d(x / y)}{dx} = \frac{1}{y}$, $\frac{d(x / y)}{dy} = -\frac{x}{y^2}$

 ### Implementation Strategy
 Each operation creates a closure that captures the input variables and implements the gradient computation rule.
@@ -680,7 +680,7 @@ def divide(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Va
    4. Return Variable with result and grad_fn
    
    MATHEMATICAL RULE:
-    If z = x / y, then dz/dx = 1/y, dz/dy = -x/y²
+    If z = x / y, then dz/dx = \frac{1}{y}, dz/dy = -\frac{x}{y^2}
    
    EXAMPLE:
    x = Variable(6.0), y = Variable(2.0)