mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-22 03:59:33 -05:00
- Change from x.data * mask to Tensor multiplication (x * mask_tensor * scale) - Preserves computation graph and gradient flow - Required for transformer with dropout regularization
1032 lines
44 KiB
Plaintext
1032 lines
44 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "794e99a4",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"# Module 03: Layers - Building Blocks of Neural Networks\n",
|
||
"\n",
|
||
"Welcome to Module 03! You're about to build the fundamental building blocks that make neural networks possible.\n",
|
||
"\n",
|
||
"## 🔗 Prerequisites & Progress\n",
|
||
"**You've Built**: Tensor class (Module 01) with all operations and activations (Module 02)\n",
|
||
"**You'll Build**: Linear layers and Dropout regularization\n",
|
||
"**You'll Enable**: Multi-layer neural networks, trainable parameters, and forward passes\n",
|
||
"\n",
|
||
"**Connection Map**:\n",
|
||
"```\n",
|
||
"Tensor → Activations → Layers → Networks\n",
|
||
"(data) (intelligence) (building blocks) (architectures)\n",
|
||
"```\n",
|
||
"\n",
|
||
"## Learning Objectives\n",
|
||
"By the end of this module, you will:\n",
|
||
"1. Implement Linear layers with proper weight initialization\n",
|
||
"2. Add Dropout for regularization during training\n",
|
||
"3. Understand parameter management and counting\n",
|
||
"4. Test individual layer components\n",
|
||
"\n",
|
||
"Let's get started!\n",
|
||
"\n",
|
||
"## 📦 Where This Code Lives in the Final Package\n",
|
||
"\n",
|
||
"**Learning Side:** You work in modules/03_layers/layers_dev.py\n",
|
||
"**Building Side:** Code exports to tinytorch.core.layers\n",
|
||
"\n",
|
||
"```python\n",
|
||
"# Final package structure:\n",
|
||
"from tinytorch.core.layers import Linear, Dropout # This module\n",
|
||
"from tinytorch.core.tensor import Tensor # Module 01 - foundation\n",
|
||
"from tinytorch.core.activations import ReLU, Sigmoid # Module 02 - intelligence\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why this matters:**\n",
|
||
"- **Learning:** Complete layer system in one focused module for deep understanding\n",
|
||
"- **Production:** Proper organization like PyTorch's torch.nn with all layer building blocks together\n",
|
||
"- **Consistency:** All layer operations and parameter management in core.layers\n",
|
||
"- **Integration:** Works seamlessly with tensors and activations for complete neural networks"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "901fe04d",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "imports",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| default_exp core.layers\n",
|
||
"#| export\n",
|
||
"\n",
|
||
"import numpy as np\n",
|
||
"import sys\n",
|
||
"import os\n",
|
||
"\n",
|
||
"# Import dependencies from tinytorch package\n",
|
||
"from tinytorch.core.tensor import Tensor\n",
|
||
"from tinytorch.core.activations import ReLU, Sigmoid"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "967152a3",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 1. Introduction: What are Neural Network Layers?\n",
|
||
"\n",
|
||
"Neural network layers are the fundamental building blocks that transform data as it flows through a network. Each layer performs a specific computation:\n",
|
||
"\n",
|
||
"- **Linear layers** apply learned transformations: `y = xW + b`\n",
|
||
"- **Dropout layers** randomly zero elements for regularization\n",
|
||
"\n",
|
||
"Think of layers as processing stations in a factory:\n",
|
||
"```\n",
|
||
"Input Data → Layer 1 → Layer 2 → Layer 3 → Output\n",
|
||
" ↓ ↓ ↓ ↓ ↓\n",
|
||
" Features Hidden Hidden Hidden Predictions\n",
|
||
"```\n",
|
||
"\n",
|
||
"Each layer learns its own piece of the puzzle. Linear layers learn which features matter, while dropout prevents overfitting by forcing robustness."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ec1e941b",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 2. Foundations: Mathematical Background\n",
|
||
"\n",
|
||
"### Linear Layer Mathematics\n",
|
||
"A linear layer implements: **y = xW + b**\n",
|
||
"\n",
|
||
"```\n",
|
||
"Input x (batch_size, in_features) @ Weight W (in_features, out_features) + Bias b (out_features)\n",
|
||
" = Output y (batch_size, out_features)\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Weight Initialization\n",
|
||
"Random initialization is crucial for breaking symmetry:\n",
|
||
"- **Xavier/Glorot**: Scale by sqrt(1/fan_in) for stable gradients\n",
|
||
"- **He**: Scale by sqrt(2/fan_in) for ReLU activation\n",
|
||
"- **Too small**: Gradients vanish, learning is slow\n",
|
||
"- **Too large**: Gradients explode, training unstable\n",
|
||
"\n",
|
||
"### Parameter Counting\n",
|
||
"```\n",
|
||
"Linear(784, 256): 784 × 256 + 256 = 200,960 parameters\n",
|
||
"\n",
|
||
"Manual composition:\n",
|
||
" layer1 = Linear(784, 256) # 200,960 params\n",
|
||
" activation = ReLU() # 0 params\n",
|
||
" layer2 = Linear(256, 10) # 2,570 params\n",
|
||
" # Total: 203,530 params\n",
|
||
"```\n",
|
||
"\n",
|
||
"Memory usage: 4 bytes/param × 203,530 = ~814KB for weights alone"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "908da7b4",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 3. Implementation: Building Layer Foundation\n",
|
||
"\n",
|
||
"Let's build our layer system step by step. We'll implement two essential layer types:\n",
|
||
"\n",
|
||
"1. **Linear Layer** - The workhorse of neural networks\n",
|
||
"2. **Dropout Layer** - Prevents overfitting\n",
|
||
"\n",
|
||
"### Key Design Principles:\n",
|
||
"- All methods defined INSIDE classes (no monkey-patching)\n",
|
||
"- Parameter tensors have requires_grad=True (ready for Module 05)\n",
|
||
"- Forward methods return new tensors, preserving immutability\n",
|
||
"- parameters() method enables optimizer integration"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "dad822a3",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🏗️ Linear Layer - The Foundation of Neural Networks\n",
|
||
"\n",
|
||
"Linear layers (also called Dense or Fully Connected layers) are the fundamental building blocks of neural networks. They implement the mathematical operation:\n",
|
||
"\n",
|
||
"**y = xW + b**\n",
|
||
"\n",
|
||
"Where:\n",
|
||
"- **x**: Input features (what we know)\n",
|
||
"- **W**: Weight matrix (what we learn)\n",
|
||
"- **b**: Bias vector (adjusts the output)\n",
|
||
"- **y**: Output features (what we predict)\n",
|
||
"\n",
|
||
"### Why Linear Layers Matter\n",
|
||
"\n",
|
||
"Linear layers learn **feature combinations**. Each output neuron asks: \"What combination of input features is most useful for my task?\" The network discovers these combinations through training.\n",
|
||
"\n",
|
||
"### Data Flow Visualization\n",
|
||
"```\n",
|
||
"Input Features Weight Matrix Bias Vector Output Features\n",
|
||
"[batch, in_feat] @ [in_feat, out_feat] + [out_feat] = [batch, out_feat]\n",
|
||
"\n",
|
||
"Example: MNIST Digit Recognition\n",
|
||
"[32, 784] @ [784, 10] + [10] = [32, 10]\n",
|
||
" ↑ ↑ ↑ ↑\n",
|
||
"32 images 784 pixels 10 classes 10 probabilities\n",
|
||
" to 10 classes adjustments per image\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Memory Layout\n",
|
||
"```\n",
|
||
"Linear(784, 256) Parameters:\n",
|
||
"┌─────────────────────────────┐\n",
|
||
"│ Weight Matrix W │ 784 × 256 = 200,704 params\n",
|
||
"│ [784, 256] float32 │ × 4 bytes = 802.8 KB\n",
|
||
"├─────────────────────────────┤\n",
|
||
"│ Bias Vector b │ 256 params\n",
|
||
"│ [256] float32 │ × 4 bytes = 1.0 KB\n",
|
||
"└─────────────────────────────┘\n",
|
||
" Total: 803.8 KB for one layer\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ac6dc79d",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "linear-layer",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Linear:\n",
|
||
" \"\"\"\n",
|
||
" Linear (fully connected) layer: y = xW + b\n",
|
||
"\n",
|
||
" This is the fundamental building block of neural networks.\n",
|
||
" Applies a linear transformation to incoming data.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def __init__(self, in_features, out_features, bias=True):\n",
|
||
" \"\"\"\n",
|
||
" Initialize linear layer with proper weight initialization.\n",
|
||
"\n",
|
||
" TODO: Initialize weights and bias with Xavier initialization\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Create weight matrix (in_features, out_features) with Xavier scaling\n",
|
||
" 2. Create bias vector (out_features,) initialized to zeros if bias=True\n",
|
||
" 3. Set requires_grad=True for parameters (ready for Module 05)\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> layer = Linear(784, 10) # MNIST classifier final layer\n",
|
||
" >>> print(layer.weight.shape)\n",
|
||
" (784, 10)\n",
|
||
" >>> print(layer.bias.shape)\n",
|
||
" (10,)\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Xavier init: scale = sqrt(1/in_features)\n",
|
||
" - Use np.random.randn() for normal distribution\n",
|
||
" - bias=None when bias=False\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" self.in_features = in_features\n",
|
||
" self.out_features = out_features\n",
|
||
"\n",
|
||
" # Xavier/Glorot initialization for stable gradients\n",
|
||
" scale = np.sqrt(1.0 / in_features)\n",
|
||
" weight_data = np.random.randn(in_features, out_features) * scale\n",
|
||
" self.weight = Tensor(weight_data, requires_grad=True)\n",
|
||
"\n",
|
||
" # Initialize bias to zeros or None\n",
|
||
" if bias:\n",
|
||
" bias_data = np.zeros(out_features)\n",
|
||
" self.bias = Tensor(bias_data, requires_grad=True)\n",
|
||
" else:\n",
|
||
" self.bias = None\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def forward(self, x):\n",
|
||
" \"\"\"\n",
|
||
" Forward pass through linear layer.\n",
|
||
"\n",
|
||
" TODO: Implement y = xW + b\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Matrix multiply input with weights: xW\n",
|
||
" 2. Add bias if it exists\n",
|
||
" 3. Return result as new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> layer = Linear(3, 2)\n",
|
||
" >>> x = Tensor([[1, 2, 3], [4, 5, 6]]) # 2 samples, 3 features\n",
|
||
" >>> y = layer.forward(x)\n",
|
||
" >>> print(y.shape)\n",
|
||
" (2, 2) # 2 samples, 2 outputs\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Use tensor.matmul() for matrix multiplication\n",
|
||
" - Handle bias=None case\n",
|
||
" - Broadcasting automatically handles bias addition\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Linear transformation: y = xW\n",
|
||
" output = x.matmul(self.weight)\n",
|
||
"\n",
|
||
" # Add bias if present\n",
|
||
" if self.bias is not None:\n",
|
||
" output = output + self.bias\n",
|
||
"\n",
|
||
" return output\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x):\n",
|
||
" \"\"\"Allows the layer to be called like a function.\"\"\"\n",
|
||
" return self.forward(x)\n",
|
||
"\n",
|
||
" def parameters(self):\n",
|
||
" \"\"\"\n",
|
||
" Return list of trainable parameters.\n",
|
||
"\n",
|
||
" TODO: Return all tensors that need gradients\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Start with weight (always present)\n",
|
||
" 2. Add bias if it exists\n",
|
||
" 3. Return as list for optimizer\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" params = [self.weight]\n",
|
||
" if self.bias is not None:\n",
|
||
" params.append(self.bias)\n",
|
||
" return params\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __repr__(self):\n",
|
||
" \"\"\"String representation for debugging.\"\"\"\n",
|
||
" bias_str = f\", bias={self.bias is not None}\"\n",
|
||
" return f\"Linear(in_features={self.in_features}, out_features={self.out_features}{bias_str})\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ff32f81b",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: Linear Layer\n",
|
||
"This test validates our Linear layer implementation works correctly.\n",
|
||
"**What we're testing**: Weight initialization, forward pass, parameter management\n",
|
||
"**Why it matters**: Foundation for all neural network architectures\n",
|
||
"**Expected**: Proper shapes, Xavier scaling, parameter counting"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a5b2ca52",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-linear",
|
||
"locked": true,
|
||
"points": 15
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_linear_layer():\n",
|
||
" \"\"\"🔬 Test Linear layer implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Linear Layer...\")\n",
|
||
"\n",
|
||
" # Test layer creation\n",
|
||
" layer = Linear(784, 256)\n",
|
||
" assert layer.in_features == 784\n",
|
||
" assert layer.out_features == 256\n",
|
||
" assert layer.weight.shape == (784, 256)\n",
|
||
" assert layer.bias.shape == (256,)\n",
|
||
" assert layer.weight.requires_grad == True\n",
|
||
" assert layer.bias.requires_grad == True\n",
|
||
"\n",
|
||
" # Test Xavier initialization (weights should be reasonably scaled)\n",
|
||
" weight_std = np.std(layer.weight.data)\n",
|
||
" expected_std = np.sqrt(1.0 / 784)\n",
|
||
" assert 0.5 * expected_std < weight_std < 2.0 * expected_std, f\"Weight std {weight_std} not close to Xavier {expected_std}\"\n",
|
||
"\n",
|
||
" # Test bias initialization (should be zeros)\n",
|
||
" assert np.allclose(layer.bias.data, 0), \"Bias should be initialized to zeros\"\n",
|
||
"\n",
|
||
" # Test forward pass\n",
|
||
" x = Tensor(np.random.randn(32, 784)) # Batch of 32 samples\n",
|
||
" y = layer.forward(x)\n",
|
||
" assert y.shape == (32, 256), f\"Expected shape (32, 256), got {y.shape}\"\n",
|
||
"\n",
|
||
" # Test no bias option\n",
|
||
" layer_no_bias = Linear(10, 5, bias=False)\n",
|
||
" assert layer_no_bias.bias is None\n",
|
||
" params = layer_no_bias.parameters()\n",
|
||
" assert len(params) == 1 # Only weight, no bias\n",
|
||
"\n",
|
||
" # Test parameters method\n",
|
||
" params = layer.parameters()\n",
|
||
" assert len(params) == 2 # Weight and bias\n",
|
||
" assert params[0] is layer.weight\n",
|
||
" assert params[1] is layer.bias\n",
|
||
"\n",
|
||
" print(\"✅ Linear layer works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_linear_layer()\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ba15fcbb",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🎲 Dropout Layer - Preventing Overfitting\n",
|
||
"\n",
|
||
"Dropout is a regularization technique that randomly \"turns off\" neurons during training. This forces the network to not rely too heavily on any single neuron, making it more robust and generalizable.\n",
|
||
"\n",
|
||
"### Why Dropout Matters\n",
|
||
"\n",
|
||
"**The Problem**: Neural networks can memorize training data instead of learning generalizable patterns. This leads to poor performance on new, unseen data.\n",
|
||
"\n",
|
||
"**The Solution**: Dropout randomly zeros out neurons, forcing the network to learn multiple independent ways to solve the problem.\n",
|
||
"\n",
|
||
"### Dropout in Action\n",
|
||
"```\n",
|
||
"Training Mode (p=0.5 dropout):\n",
|
||
"Input: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]\n",
|
||
" ↓ Random mask with 50% survival rate\n",
|
||
"Mask: [1, 0, 1, 0, 1, 1, 0, 1 ]\n",
|
||
" ↓ Apply mask and scale by 1/(1-p) = 2.0\n",
|
||
"Output: [2.0, 0.0, 6.0, 0.0, 10.0, 12.0, 0.0, 16.0]\n",
|
||
"\n",
|
||
"Inference Mode (no dropout):\n",
|
||
"Input: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]\n",
|
||
" ↓ Pass through unchanged\n",
|
||
"Output: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Training vs Inference Behavior\n",
|
||
"```\n",
|
||
" Training Mode Inference Mode\n",
|
||
" ┌─────────────────┐ ┌─────────────────┐\n",
|
||
"Input Features │ [×] [ ] [×] [×] │ │ [×] [×] [×] [×] │\n",
|
||
" │ Active Dropped │ → │ All Active │\n",
|
||
" │ Active Active │ │ │\n",
|
||
" └─────────────────┘ └─────────────────┘\n",
|
||
" ↓ ↓\n",
|
||
" \"Learn robustly\" \"Use all knowledge\"\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Memory and Performance\n",
|
||
"```\n",
|
||
"Dropout Memory Usage:\n",
|
||
"┌─────────────────────────────┐\n",
|
||
"│ Input Tensor: X MB │\n",
|
||
"├─────────────────────────────┤\n",
|
||
"│ Random Mask: X/4 MB │ (boolean mask, 1 byte/element)\n",
|
||
"├─────────────────────────────┤\n",
|
||
"│ Output Tensor: X MB │\n",
|
||
"└─────────────────────────────┘\n",
|
||
" Total: ~2.25X MB peak memory\n",
|
||
"\n",
|
||
"Computational Overhead: Minimal (element-wise operations)\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "644af0ae",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "dropout-layer",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Dropout:\n",
|
||
" \"\"\"\n",
|
||
" Dropout layer for regularization.\n",
|
||
"\n",
|
||
" During training: randomly zeros elements with probability p\n",
|
||
" During inference: scales outputs by (1-p) to maintain expected value\n",
|
||
"\n",
|
||
" This prevents overfitting by forcing the network to not rely on specific neurons.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def __init__(self, p=0.5):\n",
|
||
" \"\"\"\n",
|
||
" Initialize dropout layer.\n",
|
||
"\n",
|
||
" TODO: Store dropout probability\n",
|
||
"\n",
|
||
" Args:\n",
|
||
" p: Probability of zeroing each element (0.0 = no dropout, 1.0 = zero everything)\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> dropout = Dropout(0.5) # Zero 50% of elements during training\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if not 0.0 <= p <= 1.0:\n",
|
||
" raise ValueError(f\"Dropout probability must be between 0 and 1, got {p}\")\n",
|
||
" self.p = p\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def forward(self, x, training=True):\n",
|
||
" \"\"\"\n",
|
||
" Forward pass through dropout layer.\n",
|
||
"\n",
|
||
" TODO: Apply dropout during training, pass through during inference\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. If not training, return input unchanged\n",
|
||
" 2. If training, create random mask with probability (1-p)\n",
|
||
" 3. Multiply input by mask and scale by 1/(1-p)\n",
|
||
" 4. Return result as new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> dropout = Dropout(0.5)\n",
|
||
" >>> x = Tensor([1, 2, 3, 4])\n",
|
||
" >>> y_train = dropout.forward(x, training=True) # Some elements zeroed\n",
|
||
" >>> y_eval = dropout.forward(x, training=False) # All elements preserved\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Use np.random.random() < keep_prob for mask\n",
|
||
" - Scale by 1/(1-p) to maintain expected value\n",
|
||
" - training=False should return input unchanged\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if not training or self.p == 0.0:\n",
|
||
" # During inference or no dropout, pass through unchanged\n",
|
||
" return x\n",
|
||
"\n",
|
||
" if self.p == 1.0:\n",
|
||
" # Drop everything (preserve requires_grad for gradient flow)\n",
|
||
" return Tensor(np.zeros_like(x.data), requires_grad=x.requires_grad if hasattr(x, 'requires_grad') else False)\n",
|
||
"\n",
|
||
" # During training, apply dropout\n",
|
||
" keep_prob = 1.0 - self.p\n",
|
||
"\n",
|
||
" # Create random mask: True where we keep elements\n",
|
||
" mask = np.random.random(x.data.shape) < keep_prob\n",
|
||
"\n",
|
||
" # Apply mask and scale using Tensor operations to preserve gradients!\n",
|
||
" mask_tensor = Tensor(mask.astype(np.float32), requires_grad=False) # Mask doesn't need gradients\n",
|
||
" scale = Tensor(np.array(1.0 / keep_prob), requires_grad=False)\n",
|
||
" \n",
|
||
" # Use Tensor operations: x * mask * scale\n",
|
||
" output = x * mask_tensor * scale\n",
|
||
" return output\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x, training=True):\n",
|
||
" \"\"\"Allows the layer to be called like a function.\"\"\"\n",
|
||
" return self.forward(x, training)\n",
|
||
"\n",
|
||
" def parameters(self):\n",
|
||
" \"\"\"Dropout has no parameters.\"\"\"\n",
|
||
" return []\n",
|
||
"\n",
|
||
" def __repr__(self):\n",
|
||
" return f\"Dropout(p={self.p})\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "62a0de23",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: Dropout Layer\n",
|
||
"This test validates our Dropout layer implementation works correctly.\n",
|
||
"**What we're testing**: Training vs inference behavior, probability scaling, randomness\n",
|
||
"**Why it matters**: Essential for preventing overfitting in neural networks\n",
|
||
"**Expected**: Correct masking during training, passthrough during inference"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "3877feeb",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-dropout",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_dropout_layer():\n",
|
||
" \"\"\"🔬 Test Dropout layer implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Dropout Layer...\")\n",
|
||
"\n",
|
||
" # Test dropout creation\n",
|
||
" dropout = Dropout(0.5)\n",
|
||
" assert dropout.p == 0.5\n",
|
||
"\n",
|
||
" # Test inference mode (should pass through unchanged)\n",
|
||
" x = Tensor([1, 2, 3, 4])\n",
|
||
" y_inference = dropout.forward(x, training=False)\n",
|
||
" assert np.array_equal(x.data, y_inference.data), \"Inference should pass through unchanged\"\n",
|
||
"\n",
|
||
" # Test training mode with zero dropout (should pass through unchanged)\n",
|
||
" dropout_zero = Dropout(0.0)\n",
|
||
" y_zero = dropout_zero.forward(x, training=True)\n",
|
||
" assert np.array_equal(x.data, y_zero.data), \"Zero dropout should pass through unchanged\"\n",
|
||
"\n",
|
||
" # Test training mode with full dropout (should zero everything)\n",
|
||
" dropout_full = Dropout(1.0)\n",
|
||
" y_full = dropout_full.forward(x, training=True)\n",
|
||
" assert np.allclose(y_full.data, 0), \"Full dropout should zero everything\"\n",
|
||
"\n",
|
||
" # Test training mode with partial dropout\n",
|
||
" # Note: This is probabilistic, so we test statistical properties\n",
|
||
" np.random.seed(42) # For reproducible test\n",
|
||
" x_large = Tensor(np.ones((1000,))) # Large tensor for statistical significance\n",
|
||
" y_train = dropout.forward(x_large, training=True)\n",
|
||
"\n",
|
||
" # Count non-zero elements (approximately 50% should survive)\n",
|
||
" non_zero_count = np.count_nonzero(y_train.data)\n",
|
||
" expected_survival = 1000 * 0.5\n",
|
||
" # Allow 10% tolerance for randomness\n",
|
||
" assert 0.4 * 1000 < non_zero_count < 0.6 * 1000, f\"Expected ~500 survivors, got {non_zero_count}\"\n",
|
||
"\n",
|
||
" # Test scaling (surviving elements should be scaled by 1/(1-p) = 2.0)\n",
|
||
" surviving_values = y_train.data[y_train.data != 0]\n",
|
||
" expected_value = 2.0 # 1.0 / (1 - 0.5)\n",
|
||
" assert np.allclose(surviving_values, expected_value), f\"Surviving values should be {expected_value}\"\n",
|
||
"\n",
|
||
" # Test no parameters\n",
|
||
" params = dropout.parameters()\n",
|
||
" assert len(params) == 0, \"Dropout should have no parameters\"\n",
|
||
"\n",
|
||
" # Test invalid probability\n",
|
||
" try:\n",
|
||
" Dropout(-0.1)\n",
|
||
" assert False, \"Should raise ValueError for negative probability\"\n",
|
||
" except ValueError:\n",
|
||
" pass\n",
|
||
"\n",
|
||
" try:\n",
|
||
" Dropout(1.1)\n",
|
||
" assert False, \"Should raise ValueError for probability > 1\"\n",
|
||
" except ValueError:\n",
|
||
" pass\n",
|
||
"\n",
|
||
" print(\"✅ Dropout layer works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_dropout_layer()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "cbb58951",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## 4. Integration: Bringing It Together\n",
|
||
"\n",
|
||
"Now that we've built both layer types, let's see how they work together to create a complete neural network architecture. We'll manually compose a realistic 3-layer MLP for MNIST digit classification.\n",
|
||
"\n",
|
||
"### Network Architecture Visualization\n",
|
||
"```\n",
|
||
"MNIST Classification Network (3-Layer MLP):\n",
|
||
"\n",
|
||
" Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer\n",
|
||
"┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐\n",
|
||
"│ 784 │ │ 256 │ │ 128 │ │ 10 │\n",
|
||
"│ Pixels │───▶│ Features │───▶│ Features │───▶│ Classes │\n",
|
||
"│ (28×28 image) │ │ + ReLU │ │ + ReLU │ │ (0-9 digits) │\n",
|
||
"│ │ │ + Dropout │ │ + Dropout │ │ │\n",
|
||
"└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘\n",
|
||
" ↓ ↓ ↓ ↓\n",
|
||
" \"Raw pixels\" \"Edge detectors\" \"Shape detectors\" \"Digit classifier\"\n",
|
||
"\n",
|
||
"Data Flow:\n",
|
||
"[32, 784] → Linear(784,256) → ReLU → Dropout(0.5) → Linear(256,128) → ReLU → Dropout(0.3) → Linear(128,10) → [32, 10]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Parameter Count Analysis\n",
|
||
"```\n",
|
||
"Parameter Breakdown (Manual Layer Composition):\n",
|
||
"┌─────────────────────────────────────────────────────────────┐\n",
|
||
"│ layer1 = Linear(784 → 256) │\n",
|
||
"│ Weights: 784 × 256 = 200,704 params │\n",
|
||
"│ Bias: 256 params │\n",
|
||
"│ Subtotal: 200,960 params │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ activation1 = ReLU(), dropout1 = Dropout(0.5) │\n",
|
||
"│ Parameters: 0 (no learnable weights) │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ layer2 = Linear(256 → 128) │\n",
|
||
"│ Weights: 256 × 128 = 32,768 params │\n",
|
||
"│ Bias: 128 params │\n",
|
||
"│ Subtotal: 32,896 params │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ activation2 = ReLU(), dropout2 = Dropout(0.3) │\n",
|
||
"│ Parameters: 0 (no learnable weights) │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ layer3 = Linear(128 → 10) │\n",
|
||
"│ Weights: 128 × 10 = 1,280 params │\n",
|
||
"│ Bias: 10 params │\n",
|
||
"│ Subtotal: 1,290 params │\n",
|
||
"└─────────────────────────────────────────────────────────────┘\n",
|
||
" TOTAL: 235,146 parameters\n",
|
||
" Memory: ~940 KB (float32)\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fee73cb8",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## 5. Systems Analysis: Memory and Performance\n",
|
||
"\n",
|
||
"Now let's analyze the systems characteristics of our layer implementations. Understanding memory usage and computational complexity helps us build efficient neural networks.\n",
|
||
"\n",
|
||
"### Memory Analysis Overview\n",
|
||
"```\n",
|
||
"Layer Memory Components:\n",
|
||
"┌─────────────────────────────────────────────────────────────┐\n",
|
||
"│ PARAMETER MEMORY │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ • Weights: Persistent, shared across batches │\n",
|
||
"│ • Biases: Small but necessary for output shifting │\n",
|
||
"│ • Total: Grows with network width and depth │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ ACTIVATION MEMORY │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ • Input tensors: batch_size × features × 4 bytes │\n",
|
||
"│ • Output tensors: batch_size × features × 4 bytes │\n",
|
||
"│ • Intermediate results during forward pass │\n",
|
||
"│ • Total: Grows with batch size and layer width │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ TEMPORARY MEMORY │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ • Dropout masks: batch_size × features × 1 byte │\n",
|
||
"│ • Computation buffers for matrix operations │\n",
|
||
"│ • Total: Peak during forward/backward passes │\n",
|
||
"└─────────────────────────────────────────────────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Computational Complexity Overview\n",
|
||
"```\n",
|
||
"Layer Operation Complexity:\n",
|
||
"┌─────────────────────────────────────────────────────────────┐\n",
|
||
"│ Linear Layer Forward Pass: │\n",
|
||
"│ Matrix Multiply: O(batch × in_features × out_features) │\n",
|
||
"│ Bias Addition: O(batch × out_features) │\n",
|
||
"│ Dominant: Matrix multiplication │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ Multi-layer Forward Pass: │\n",
|
||
"│ Sum of all layer complexities │\n",
|
||
"│ Memory: Peak of all intermediate activations │\n",
|
||
"├─────────────────────────────────────────────────────────────┤\n",
|
||
"│ Dropout Forward Pass: │\n",
|
||
"│ Mask Generation: O(elements) │\n",
|
||
"│ Element-wise Multiply: O(elements) │\n",
|
||
"│ Overhead: Minimal compared to linear layers │\n",
|
||
"└─────────────────────────────────────────────────────────────┘\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "4fc6a34e",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "analyze-layer-memory",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def analyze_layer_memory():\n",
|
||
" \"\"\"📊 Analyze memory usage patterns in layer operations.\"\"\"\n",
|
||
" print(\"📊 Analyzing Layer Memory Usage...\")\n",
|
||
"\n",
|
||
" # Test different layer sizes\n",
|
||
" layer_configs = [\n",
|
||
" (784, 256), # MNIST → hidden\n",
|
||
" (256, 256), # Hidden → hidden\n",
|
||
" (256, 10), # Hidden → output\n",
|
||
" (2048, 2048), # Large hidden\n",
|
||
" ]\n",
|
||
"\n",
|
||
" print(\"\\nLinear Layer Memory Analysis:\")\n",
|
||
" print(\"Configuration → Weight Memory → Bias Memory → Total Memory\")\n",
|
||
"\n",
|
||
" for in_feat, out_feat in layer_configs:\n",
|
||
" # Calculate memory usage\n",
|
||
" weight_memory = in_feat * out_feat * 4 # 4 bytes per float32\n",
|
||
" bias_memory = out_feat * 4\n",
|
||
" total_memory = weight_memory + bias_memory\n",
|
||
"\n",
|
||
" print(f\"({in_feat:4d}, {out_feat:4d}) → {weight_memory/1024:7.1f} KB → {bias_memory/1024:6.1f} KB → {total_memory/1024:7.1f} KB\")\n",
|
||
"\n",
|
||
" # Analyze multi-layer memory scaling\n",
|
||
" print(\"\\n💡 Multi-layer Model Memory Scaling:\")\n",
|
||
" hidden_sizes = [128, 256, 512, 1024, 2048]\n",
|
||
"\n",
|
||
" for hidden_size in hidden_sizes:\n",
|
||
" # 3-layer MLP: 784 → hidden → hidden/2 → 10\n",
|
||
" layer1_params = 784 * hidden_size + hidden_size\n",
|
||
" layer2_params = hidden_size * (hidden_size // 2) + (hidden_size // 2)\n",
|
||
" layer3_params = (hidden_size // 2) * 10 + 10\n",
|
||
"\n",
|
||
" total_params = layer1_params + layer2_params + layer3_params\n",
|
||
" memory_mb = total_params * 4 / (1024 * 1024)\n",
|
||
"\n",
|
||
" print(f\"Hidden={hidden_size:4d}: {total_params:7,} params = {memory_mb:5.1f} MB\")\n",
|
||
"\n",
|
||
"# Analysis will be run in main block"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "16816429",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "analyze-layer-performance",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def analyze_layer_performance():\n",
|
||
" \"\"\"📊 Analyze computational complexity of layer operations.\"\"\"\n",
|
||
" print(\"📊 Analyzing Layer Computational Complexity...\")\n",
|
||
"\n",
|
||
" # Test forward pass FLOPs\n",
|
||
" batch_sizes = [1, 32, 128, 512]\n",
|
||
" layer = Linear(784, 256)\n",
|
||
"\n",
|
||
" print(\"\\nLinear Layer FLOPs Analysis:\")\n",
|
||
" print(\"Batch Size → Matrix Multiply FLOPs → Bias Add FLOPs → Total FLOPs\")\n",
|
||
"\n",
|
||
" for batch_size in batch_sizes:\n",
|
||
" # Matrix multiplication: (batch, in) @ (in, out) = batch * in * out FLOPs\n",
|
||
" matmul_flops = batch_size * 784 * 256\n",
|
||
" # Bias addition: batch * out FLOPs\n",
|
||
" bias_flops = batch_size * 256\n",
|
||
" total_flops = matmul_flops + bias_flops\n",
|
||
"\n",
|
||
" print(f\"{batch_size:10d} → {matmul_flops:15,} → {bias_flops:13,} → {total_flops:11,}\")\n",
|
||
"\n",
|
||
" print(\"\\n💡 Key Insights:\")\n",
|
||
" print(\"🚀 Linear layer complexity: O(batch_size × in_features × out_features)\")\n",
|
||
" print(\"🚀 Memory grows linearly with batch size, quadratically with layer width\")\n",
|
||
" print(\"🚀 Dropout adds minimal computational overhead (element-wise operations)\")\n",
|
||
"\n",
|
||
"# Analysis will be run in main block"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9b80cd94",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"\"\"\"\n",
|
||
"# 🧪 Module Integration Test\n",
|
||
"\n",
|
||
"Final validation that everything works together correctly.\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"def import_previous_module(module_name: str, component_name: str):\n",
|
||
" import sys\n",
|
||
" import os\n",
|
||
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', module_name))\n",
|
||
" module = __import__(f\"{module_name.split('_')[1]}_dev\")\n",
|
||
" return getattr(module, component_name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "3a80be9e",
|
||
"metadata": {
|
||
"lines_to_next_cell": 2,
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "module-integration",
|
||
"locked": true,
|
||
"points": 20
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_module():\n",
|
||
" \"\"\"\n",
|
||
" Comprehensive test of entire module functionality.\n",
|
||
"\n",
|
||
" This final test runs before module summary to ensure:\n",
|
||
" - All unit tests pass\n",
|
||
" - Functions work together correctly\n",
|
||
" - Module is ready for integration with TinyTorch\n",
|
||
" \"\"\"\n",
|
||
" print(\"🧪 RUNNING MODULE INTEGRATION TEST\")\n",
|
||
" print(\"=\" * 50)\n",
|
||
"\n",
|
||
" # Run all unit tests\n",
|
||
" print(\"Running unit tests...\")\n",
|
||
" test_unit_linear_layer()\n",
|
||
" test_unit_dropout_layer()\n",
|
||
"\n",
|
||
" print(\"\\nRunning integration scenarios...\")\n",
|
||
"\n",
|
||
" # Test realistic neural network construction with manual composition\n",
|
||
" print(\"🔬 Integration Test: Multi-layer Network...\")\n",
|
||
"\n",
|
||
" # Import real activation from module 02 using standardized helper\n",
|
||
" ReLU = import_previous_module('02_activations', 'ReLU')\n",
|
||
"\n",
|
||
" # Build individual layers for manual composition\n",
|
||
" layer1 = Linear(784, 128)\n",
|
||
" activation1 = ReLU()\n",
|
||
" dropout1 = Dropout(0.5)\n",
|
||
" layer2 = Linear(128, 64)\n",
|
||
" activation2 = ReLU()\n",
|
||
" dropout2 = Dropout(0.3)\n",
|
||
" layer3 = Linear(64, 10)\n",
|
||
"\n",
|
||
" # Test end-to-end forward pass with manual composition\n",
|
||
" batch_size = 16\n",
|
||
" x = Tensor(np.random.randn(batch_size, 784))\n",
|
||
"\n",
|
||
" # Manual forward pass\n",
|
||
" x = layer1.forward(x)\n",
|
||
" x = activation1.forward(x)\n",
|
||
" x = dropout1.forward(x)\n",
|
||
" x = layer2.forward(x)\n",
|
||
" x = activation2.forward(x)\n",
|
||
" x = dropout2.forward(x)\n",
|
||
" output = layer3.forward(x)\n",
|
||
"\n",
|
||
" assert output.shape == (batch_size, 10), f\"Expected output shape ({batch_size}, 10), got {output.shape}\"\n",
|
||
"\n",
|
||
" # Test parameter counting from individual layers\n",
|
||
" all_params = layer1.parameters() + layer2.parameters() + layer3.parameters()\n",
|
||
" expected_params = 6 # 3 weights + 3 biases from 3 Linear layers\n",
|
||
" assert len(all_params) == expected_params, f\"Expected {expected_params} parameters, got {len(all_params)}\"\n",
|
||
"\n",
|
||
" # Test all parameters have requires_grad=True\n",
|
||
" for param in all_params:\n",
|
||
" assert param.requires_grad == True, \"All parameters should have requires_grad=True\"\n",
|
||
"\n",
|
||
" # Test individual layer functionality\n",
|
||
" test_x = Tensor(np.random.randn(4, 784))\n",
|
||
" # Test dropout in training vs inference\n",
|
||
" dropout_test = Dropout(0.5)\n",
|
||
" train_output = dropout_test.forward(test_x, training=True)\n",
|
||
" infer_output = dropout_test.forward(test_x, training=False)\n",
|
||
" assert np.array_equal(test_x.data, infer_output.data), \"Inference mode should pass through unchanged\"\n",
|
||
"\n",
|
||
" print(\"✅ Multi-layer network integration works!\")\n",
|
||
"\n",
|
||
" print(\"\\n\" + \"=\" * 50)\n",
|
||
" print(\"🎉 ALL TESTS PASSED! Module ready for export.\")\n",
|
||
" print(\"Run: tito module complete 03_layers\")\n",
|
||
"\n",
|
||
"# Run comprehensive module test\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_module()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "93360ac7",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🎯 MODULE SUMMARY: Layers\n",
|
||
"\n",
|
||
"Congratulations! You've built the fundamental building blocks that make neural networks possible!\n",
|
||
"\n",
|
||
"### Key Accomplishments\n",
|
||
"- Built Linear layers with proper Xavier initialization and parameter management\n",
|
||
"- Created Dropout layers for regularization with training/inference mode handling\n",
|
||
"- Demonstrated manual layer composition for building neural networks\n",
|
||
"- Analyzed memory scaling and computational complexity of layer operations\n",
|
||
"- All tests pass ✅ (validated by `test_module()`)\n",
|
||
"\n",
|
||
"### Ready for Next Steps\n",
|
||
"Your layer implementation enables building complete neural networks! The Linear layer provides learnable transformations, manual composition chains them together, and Dropout prevents overfitting.\n",
|
||
"\n",
|
||
"Export with: `tito module complete 03_layers`\n",
|
||
"\n",
|
||
"**Next**: Module 04 will add loss functions (CrossEntropyLoss, MSELoss) that measure how wrong your model is - the foundation for learning!"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|