mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-11 20:45:02 -05:00
Module improvements: Core modules (01-08)
- Update tensor module notebook - Enhance activations module - Expand layers module functionality - Improve autograd implementation - Add optimizers enhancements - Update training module - Refine dataloader notebook
This commit is contained in:
@@ -22,20 +22,20 @@
|
||||
"\n",
|
||||
"Welcome to Module 08! You're about to build the data loading infrastructure that transforms how ML models consume data during training.\n",
|
||||
"\n",
|
||||
"## 🔗 Prerequisites & Progress\n",
|
||||
"## \ud83d\udd17 Prerequisites & Progress\n",
|
||||
"**You've Built**: Tensor operations, activations, layers, losses, autograd, optimizers, and training loops\n",
|
||||
"**You'll Build**: Dataset abstraction, DataLoader with batching/shuffling, and real dataset support\n",
|
||||
"**You'll Enable**: Efficient data pipelines that feed hungry neural networks with properly formatted batches\n",
|
||||
"\n",
|
||||
"**Connection Map**:\n",
|
||||
"```\n",
|
||||
"Training Loop → DataLoader → Batched Data → Model\n",
|
||||
"Training Loop \u2192 DataLoader \u2192 Batched Data \u2192 Model\n",
|
||||
"(Module 07) (Module 08) (optimized) (ready to learn)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"## Learning Objectives\n",
|
||||
"By the end of this module, you will:\n",
|
||||
"1. Understand the data pipeline: individual samples → batches → training\n",
|
||||
"1. Understand the data pipeline: individual samples \u2192 batches \u2192 training\n",
|
||||
"2. Implement Dataset abstraction and TensorDataset for tensor-based data\n",
|
||||
"3. Build DataLoader with intelligent batching, shuffling, and memory-efficient iteration\n",
|
||||
"4. Experience data pipeline performance characteristics firsthand\n",
|
||||
@@ -43,7 +43,7 @@
|
||||
"\n",
|
||||
"Let's transform scattered data into organized learning batches!\n",
|
||||
"\n",
|
||||
"## 📦 Where This Code Lives in the Final Package\n",
|
||||
"## \ud83d\udce6 Where This Code Lives in the Final Package\n",
|
||||
"\n",
|
||||
"**Learning Side:** You work in `modules/08_dataloader/dataloader_dev.py` \n",
|
||||
"**Building Side:** Code exports to `tinytorch.data.loader`\n",
|
||||
@@ -72,6 +72,8 @@
|
||||
"# Essential imports for data loading\n",
|
||||
"import numpy as np\n",
|
||||
"import random\n",
|
||||
"import time\n",
|
||||
"import sys\n",
|
||||
"from typing import Iterator, Tuple, List, Optional, Union\n",
|
||||
"from abc import ABC, abstractmethod\n",
|
||||
"\n",
|
||||
@@ -97,13 +99,13 @@
|
||||
"\n",
|
||||
"```\n",
|
||||
"Raw Data Storage Dataset Interface DataLoader Batching Training Loop\n",
|
||||
"┌─────────────────┐ ┌──────────────────┐ ┌────────────────────┐ ┌─────────────┐\n",
|
||||
"│ cat_001.jpg │ │ dataset[0] │ │ Batch 1: │ │ model(batch)│\n",
|
||||
"│ dog_023.jpg │ ───> │ dataset[1] │ ───> │ [cat, dog, cat] │ ───> │ optimizer │\n",
|
||||
"│ cat_045.jpg │ │ dataset[2] │ │ Batch 2: │ │ loss │\n",
|
||||
"│ ... │ │ ... │ │ [dog, cat, dog] │ │ backward │\n",
|
||||
"│ (50,000 files) │ │ dataset[49999] │ │ ... │ │ step │\n",
|
||||
"└─────────────────┘ └──────────────────┘ └────────────────────┘ └─────────────┘\n",
|
||||
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
"\u2502 cat_001.jpg \u2502 \u2502 dataset[0] \u2502 \u2502 Batch 1: \u2502 \u2502 model(batch)\u2502\n",
|
||||
"\u2502 dog_023.jpg \u2502 \u2500\u2500\u2500> \u2502 dataset[1] \u2502 \u2500\u2500\u2500> \u2502 [cat, dog, cat] \u2502 \u2500\u2500\u2500> \u2502 optimizer \u2502\n",
|
||||
"\u2502 cat_045.jpg \u2502 \u2502 dataset[2] \u2502 \u2502 Batch 2: \u2502 \u2502 loss \u2502\n",
|
||||
"\u2502 ... \u2502 \u2502 ... \u2502 \u2502 [dog, cat, dog] \u2502 \u2502 backward \u2502\n",
|
||||
"\u2502 (50,000 files) \u2502 \u2502 dataset[49999] \u2502 \u2502 ... \u2502 \u2502 step \u2502\n",
|
||||
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### Why This Pipeline Matters\n",
|
||||
@@ -122,11 +124,11 @@
|
||||
"\n",
|
||||
"```\n",
|
||||
"Dataset Interface\n",
|
||||
"┌─────────────────────────────────────┐\n",
|
||||
"│ __len__() → \"How many samples?\" │\n",
|
||||
"│ __getitem__(i) → \"Give me sample i\" │\n",
|
||||
"└─────────────────────────────────────┘\n",
|
||||
" ↑ ↑\n",
|
||||
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
"\u2502 __len__() \u2192 \"How many samples?\" \u2502\n",
|
||||
"\u2502 __getitem__(i) \u2192 \"Give me sample i\" \u2502\n",
|
||||
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
" \u2191 \u2191\n",
|
||||
" Enables for Enables indexing\n",
|
||||
" loops/iteration dataset[index]\n",
|
||||
"```\n",
|
||||
@@ -217,15 +219,15 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def test_unit_dataset():\n",
|
||||
" \"\"\"🔬 Test Dataset abstract base class.\"\"\"\n",
|
||||
" print(\"🔬 Unit Test: Dataset Abstract Base Class...\")\n",
|
||||
" \"\"\"\ud83d\udd2c Test Dataset abstract base class.\"\"\"\n",
|
||||
" print(\"\ud83d\udd2c Unit Test: Dataset Abstract Base Class...\")\n",
|
||||
"\n",
|
||||
" # Test that Dataset is properly abstract\n",
|
||||
" try:\n",
|
||||
" dataset = Dataset()\n",
|
||||
" assert False, \"Should not be able to instantiate abstract Dataset\"\n",
|
||||
" except TypeError:\n",
|
||||
" print(\"✅ Dataset is properly abstract\")\n",
|
||||
" print(\"\u2705 Dataset is properly abstract\")\n",
|
||||
"\n",
|
||||
" # Test concrete implementation\n",
|
||||
" class TestDataset(Dataset):\n",
|
||||
@@ -243,7 +245,7 @@
|
||||
" assert dataset[0] == \"item_0\"\n",
|
||||
" assert dataset[9] == \"item_9\"\n",
|
||||
"\n",
|
||||
" print(\"✅ Dataset interface works correctly!\")\n",
|
||||
" print(\"\u2705 Dataset interface works correctly!\")\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" test_unit_dataset()"
|
||||
@@ -268,16 +270,16 @@
|
||||
"```\n",
|
||||
"Input Tensors (aligned by first dimension):\n",
|
||||
" Features Tensor Labels Tensor Metadata Tensor\n",
|
||||
" ┌─────────────────┐ ┌───────────────┐ ┌─────────────────┐\n",
|
||||
" │ [1.2, 3.4, 5.6] │ │ 0 (cat) │ │ \"image_001.jpg\" │ ← Sample 0\n",
|
||||
" │ [2.1, 4.3, 6.5] │ │ 1 (dog) │ │ \"image_002.jpg\" │ ← Sample 1\n",
|
||||
" │ [3.0, 5.2, 7.4] │ │ 0 (cat) │ │ \"image_003.jpg\" │ ← Sample 2\n",
|
||||
" │ ... │ │ ... │ │ ... │\n",
|
||||
" └─────────────────┘ └───────────────┘ └─────────────────┘\n",
|
||||
" \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
" \u2502 [1.2, 3.4, 5.6] \u2502 \u2502 0 (cat) \u2502 \u2502 \"image_001.jpg\" \u2502 \u2190 Sample 0\n",
|
||||
" \u2502 [2.1, 4.3, 6.5] \u2502 \u2502 1 (dog) \u2502 \u2502 \"image_002.jpg\" \u2502 \u2190 Sample 1\n",
|
||||
" \u2502 [3.0, 5.2, 7.4] \u2502 \u2502 0 (cat) \u2502 \u2502 \"image_003.jpg\" \u2502 \u2190 Sample 2\n",
|
||||
" \u2502 ... \u2502 \u2502 ... \u2502 \u2502 ... \u2502\n",
|
||||
" \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
" (N, 3) (N,) (N,)\n",
|
||||
"\n",
|
||||
"Dataset Access:\n",
|
||||
" dataset[1] → (Tensor([2.1, 4.3, 6.5]), Tensor(1), \"image_002.jpg\")\n",
|
||||
" dataset[1] \u2192 (Tensor([2.1, 4.3, 6.5]), Tensor(1), \"image_002.jpg\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### Why TensorDataset is Powerful\n",
|
||||
@@ -419,8 +421,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def test_unit_tensordataset():\n",
|
||||
" \"\"\"🔬 Test TensorDataset implementation.\"\"\"\n",
|
||||
" print(\"🔬 Unit Test: TensorDataset...\")\n",
|
||||
" \"\"\"\ud83d\udd2c Test TensorDataset implementation.\"\"\"\n",
|
||||
" print(\"\ud83d\udd2c Unit Test: TensorDataset...\")\n",
|
||||
"\n",
|
||||
" # Test basic functionality\n",
|
||||
" features = Tensor([[1, 2], [3, 4], [5, 6]]) # 3 samples, 2 features\n",
|
||||
@@ -456,7 +458,7 @@
|
||||
" except ValueError:\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" print(\"✅ TensorDataset works correctly!\")\n",
|
||||
" print(\"\u2705 TensorDataset works correctly!\")\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" test_unit_tensordataset()"
|
||||
@@ -480,21 +482,21 @@
|
||||
"\n",
|
||||
"```\n",
|
||||
"Step 1: Individual Samples from Dataset\n",
|
||||
" dataset[0] → (features: [1, 2, 3], label: 0)\n",
|
||||
" dataset[1] → (features: [4, 5, 6], label: 1)\n",
|
||||
" dataset[2] → (features: [7, 8, 9], label: 0)\n",
|
||||
" dataset[3] → (features: [2, 3, 4], label: 1)\n",
|
||||
" dataset[0] \u2192 (features: [1, 2, 3], label: 0)\n",
|
||||
" dataset[1] \u2192 (features: [4, 5, 6], label: 1)\n",
|
||||
" dataset[2] \u2192 (features: [7, 8, 9], label: 0)\n",
|
||||
" dataset[3] \u2192 (features: [2, 3, 4], label: 1)\n",
|
||||
"\n",
|
||||
"Step 2: DataLoader Groups into Batch (batch_size=2)\n",
|
||||
" Batch 1:\n",
|
||||
" features: [[1, 2, 3], ← Stacked into shape (2, 3)\n",
|
||||
" features: [[1, 2, 3], \u2190 Stacked into shape (2, 3)\n",
|
||||
" [4, 5, 6]]\n",
|
||||
" labels: [0, 1] ← Stacked into shape (2,)\n",
|
||||
" labels: [0, 1] \u2190 Stacked into shape (2,)\n",
|
||||
"\n",
|
||||
" Batch 2:\n",
|
||||
" features: [[7, 8, 9], ← Stacked into shape (2, 3)\n",
|
||||
" features: [[7, 8, 9], \u2190 Stacked into shape (2, 3)\n",
|
||||
" [2, 3, 4]]\n",
|
||||
" labels: [0, 1] ← Stacked into shape (2,)\n",
|
||||
" labels: [0, 1] \u2190 Stacked into shape (2,)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### The Shuffling Process\n",
|
||||
@@ -508,9 +510,9 @@
|
||||
" Batch 3: [sample 4, sample 5] Batch 3: [sample 5, sample 4]\n",
|
||||
"\n",
|
||||
"Without Shuffling (epoch 2): With Shuffling (epoch 2):\n",
|
||||
" Batch 1: [sample 0, sample 1] ✗ Batch 1: [sample 1, sample 4] ✓\n",
|
||||
" Batch 2: [sample 2, sample 3] ✗ Batch 2: [sample 0, sample 5] ✓\n",
|
||||
" Batch 3: [sample 4, sample 5] ✗ Batch 3: [sample 2, sample 3] ✓\n",
|
||||
" Batch 1: [sample 0, sample 1] \u2717 Batch 1: [sample 1, sample 4] \u2713\n",
|
||||
" Batch 2: [sample 2, sample 3] \u2717 Batch 2: [sample 0, sample 5] \u2713\n",
|
||||
" Batch 3: [sample 4, sample 5] \u2717 Batch 3: [sample 2, sample 3] \u2713\n",
|
||||
"\n",
|
||||
" (Same every epoch = overfitting!) (Different combinations = better learning!)\n",
|
||||
"```\n",
|
||||
@@ -670,8 +672,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def test_unit_dataloader():\n",
|
||||
" \"\"\"🔬 Test DataLoader implementation.\"\"\"\n",
|
||||
" print(\"🔬 Unit Test: DataLoader...\")\n",
|
||||
" \"\"\"\ud83d\udd2c Test DataLoader implementation.\"\"\"\n",
|
||||
" print(\"\ud83d\udd2c Unit Test: DataLoader...\")\n",
|
||||
"\n",
|
||||
" # Create test dataset\n",
|
||||
" features = Tensor([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]) # 5 samples\n",
|
||||
@@ -717,7 +719,7 @@
|
||||
" assert shuffle_features == expected_features, \"Shuffle should preserve all data\"\n",
|
||||
" assert no_shuffle_features == expected_features, \"No shuffle should preserve all data\"\n",
|
||||
"\n",
|
||||
" print(\"✅ DataLoader works correctly!\")\n",
|
||||
" print(\"\u2705 DataLoader works correctly!\")\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" test_unit_dataloader()"
|
||||
@@ -741,12 +743,12 @@
|
||||
"\n",
|
||||
"```\n",
|
||||
"Module 08 (DataLoader) Examples & Milestones\n",
|
||||
"┌──────────────────────┐ ┌────────────────────────┐\n",
|
||||
"│ Dataset abstraction │ │ Real MNIST digits │\n",
|
||||
"│ TensorDataset impl │ ───> │ CIFAR-10 images │\n",
|
||||
"│ DataLoader batching │ │ Custom datasets │\n",
|
||||
"│ Shuffle & iteration │ │ Download utilities │\n",
|
||||
"└──────────────────────┘ └────────────────────────┘\n",
|
||||
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
"\u2502 Dataset abstraction \u2502 \u2502 Real MNIST digits \u2502\n",
|
||||
"\u2502 TensorDataset impl \u2502 \u2500\u2500\u2500> \u2502 CIFAR-10 images \u2502\n",
|
||||
"\u2502 DataLoader batching \u2502 \u2502 Custom datasets \u2502\n",
|
||||
"\u2502 Shuffle & iteration \u2502 \u2502 Download utilities \u2502\n",
|
||||
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
" (Learn mechanics) (Apply to real data)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
@@ -754,10 +756,10 @@
|
||||
"\n",
|
||||
"**What does image data actually look like?**\n",
|
||||
"\n",
|
||||
"Images are just 2D arrays of numbers (pixels). Here are actual 8×8 handwritten digits:\n",
|
||||
"Images are just 2D arrays of numbers (pixels). Here are actual 8\u00d78 handwritten digits:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Digit \"5\" (8×8): Digit \"3\" (8×8): Digit \"8\" (8×8):\n",
|
||||
"Digit \"5\" (8\u00d78): Digit \"3\" (8\u00d78): Digit \"8\" (8\u00d78):\n",
|
||||
" 0 0 12 13 5 0 0 0 0 0 11 12 0 0 0 0 0 0 10 14 8 1 0 0\n",
|
||||
" 0 0 13 15 10 0 0 0 0 2 16 16 16 7 0 0 0 0 16 15 15 9 0 0\n",
|
||||
" 0 3 15 13 16 7 0 0 0 0 8 16 8 0 0 0 0 0 15 5 5 13 0 0\n",
|
||||
@@ -768,23 +770,23 @@
|
||||
" 0 0 0 0 0 0 0 0 0 3 16 16 16 12 0 0 0 0 0 0 0 0 0 0\n",
|
||||
"\n",
|
||||
"Visual representation: \n",
|
||||
"░█████░ ░█████░ ░█████░\n",
|
||||
"░█░░░█░ ░░░░░█░ █░░░░█░\n",
|
||||
"░░░░█░░ ░░███░░ ░█████░\n",
|
||||
"░░░█░░░ ░░░░█░░ █░░░░█░\n",
|
||||
"░░█░░░░ ░█████░ ░█████░\n",
|
||||
"\u2591\u2588\u2588\u2588\u2588\u2588\u2591 \u2591\u2588\u2588\u2588\u2588\u2588\u2591 \u2591\u2588\u2588\u2588\u2588\u2588\u2591\n",
|
||||
"\u2591\u2588\u2591\u2591\u2591\u2588\u2591 \u2591\u2591\u2591\u2591\u2591\u2588\u2591 \u2588\u2591\u2591\u2591\u2591\u2588\u2591\n",
|
||||
"\u2591\u2591\u2591\u2591\u2588\u2591\u2591 \u2591\u2591\u2588\u2588\u2588\u2591\u2591 \u2591\u2588\u2588\u2588\u2588\u2588\u2591\n",
|
||||
"\u2591\u2591\u2591\u2588\u2591\u2591\u2591 \u2591\u2591\u2591\u2591\u2588\u2591\u2591 \u2588\u2591\u2591\u2591\u2591\u2588\u2591\n",
|
||||
"\u2591\u2591\u2588\u2591\u2591\u2591\u2591 \u2591\u2588\u2588\u2588\u2588\u2588\u2591 \u2591\u2588\u2588\u2588\u2588\u2588\u2591\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"**Shape transformations in DataLoader:**\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Individual Sample (from Dataset):\n",
|
||||
" image: (8, 8) ← Single 8×8 image\n",
|
||||
" label: scalar ← Single digit (0-9)\n",
|
||||
" image: (8, 8) \u2190 Single 8\u00d78 image\n",
|
||||
" label: scalar \u2190 Single digit (0-9)\n",
|
||||
"\n",
|
||||
"After DataLoader batching (batch_size=32):\n",
|
||||
" images: (32, 8, 8) ← Stack of 32 images\n",
|
||||
" labels: (32,) ← Array of 32 labels\n",
|
||||
" images: (32, 8, 8) \u2190 Stack of 32 images\n",
|
||||
" labels: (32,) \u2190 Array of 32 labels\n",
|
||||
" \n",
|
||||
"This is what your model sees during training!\n",
|
||||
"```\n",
|
||||
@@ -793,7 +795,7 @@
|
||||
"\n",
|
||||
"**Tiny Datasets (ships with TinyTorch):**\n",
|
||||
"```python\n",
|
||||
"# 8×8 handwritten digits - instant, no downloads!\n",
|
||||
"# 8\u00d78 handwritten digits - instant, no downloads!\n",
|
||||
"import numpy as np\n",
|
||||
"data = np.load('datasets/tiny/digits_8x8.npz')\n",
|
||||
"images = Tensor(data['images']) # (1797, 8, 8)\n",
|
||||
@@ -811,16 +813,16 @@
|
||||
"\n",
|
||||
"**Full Datasets (for serious training):**\n",
|
||||
"```python\n",
|
||||
"# See milestones/03_mlp_revival_1986/ for MNIST download (28×28 images)\n",
|
||||
"# See milestones/04_cnn_revolution_1998/ for CIFAR-10 download (32×32×3 images)\n",
|
||||
"# See milestones/03_mlp_revival_1986/ for MNIST download (28\u00d728 images)\n",
|
||||
"# See milestones/04_cnn_revolution_1998/ for CIFAR-10 download (32\u00d732\u00d73 images)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### What You've Accomplished\n",
|
||||
"\n",
|
||||
"You've built the **data loading infrastructure** that powers all modern ML:\n",
|
||||
"- ✅ Dataset abstraction (universal interface)\n",
|
||||
"- ✅ TensorDataset (in-memory efficiency)\n",
|
||||
"- ✅ DataLoader (batching, shuffling, iteration)\n",
|
||||
"- \u2705 Dataset abstraction (universal interface)\n",
|
||||
"- \u2705 TensorDataset (in-memory efficiency)\n",
|
||||
"- \u2705 DataLoader (batching, shuffling, iteration)\n",
|
||||
"\n",
|
||||
"**Next steps:** Apply your DataLoader to real datasets in the milestones!\n",
|
||||
"\n",
|
||||
@@ -850,17 +852,17 @@
|
||||
"\n",
|
||||
"```\n",
|
||||
"Training Step Breakdown:\n",
|
||||
"┌───────────────────────────────────────────────────────────────┐\n",
|
||||
"│ Data Loading │ Forward Pass │ Backward Pass │\n",
|
||||
"│ ████████████ │ ███████ │ ████████ │\n",
|
||||
"│ 40ms │ 25ms │ 35ms │\n",
|
||||
"└───────────────────────────────────────────────────────────────┘\n",
|
||||
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
"\u2502 Data Loading \u2502 Forward Pass \u2502 Backward Pass \u2502\n",
|
||||
"\u2502 \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 \u2502 \u2588\u2588\u2588\u2588\u2588\u2588\u2588 \u2502 \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 \u2502\n",
|
||||
"\u2502 40ms \u2502 25ms \u2502 35ms \u2502\n",
|
||||
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
" 100ms total per step\n",
|
||||
"\n",
|
||||
"Bottleneck Analysis:\n",
|
||||
"- If data loading > forward+backward: \"Data starved\" (CPU bottleneck)\n",
|
||||
"- If forward+backward > data loading: \"Compute bound\" (GPU bottleneck)\n",
|
||||
"- Ideal: Data loading ≈ computation time (balanced pipeline)\n",
|
||||
"- Ideal: Data loading \u2248 computation time (balanced pipeline)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### Memory Scaling: The Batch Size Trade-off\n",
|
||||
@@ -871,18 +873,18 @@
|
||||
"Batch Size Impact:\n",
|
||||
"\n",
|
||||
"Small Batches (batch_size=8):\n",
|
||||
"┌─────────────────────────────────────────┐\n",
|
||||
"│ Memory: 8 × 28 × 28 × 4 bytes = 25KB │ ← Low memory\n",
|
||||
"│ Overhead: High (many small batches) │ ← High overhead\n",
|
||||
"│ GPU Util: Poor (underutilized) │ ← Poor efficiency\n",
|
||||
"└─────────────────────────────────────────┘\n",
|
||||
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
"\u2502 Memory: 8 \u00d7 28 \u00d7 28 \u00d7 4 bytes = 25KB \u2502 \u2190 Low memory\n",
|
||||
"\u2502 Overhead: High (many small batches) \u2502 \u2190 High overhead\n",
|
||||
"\u2502 GPU Util: Poor (underutilized) \u2502 \u2190 Poor efficiency\n",
|
||||
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
"\n",
|
||||
"Large Batches (batch_size=512):\n",
|
||||
"┌─────────────────────────────────────────┐\n",
|
||||
"│ Memory: 512 × 28 × 28 × 4 bytes = 1.6MB│ ← Higher memory\n",
|
||||
"│ Overhead: Low (fewer large batches) │ ← Lower overhead\n",
|
||||
"│ GPU Util: Good (well utilized) │ ← Better efficiency\n",
|
||||
"└─────────────────────────────────────────┘\n",
|
||||
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
|
||||
"\u2502 Memory: 512 \u00d7 28 \u00d7 28 \u00d7 4 bytes = 1.6MB\u2502 \u2190 Higher memory\n",
|
||||
"\u2502 Overhead: Low (fewer large batches) \u2502 \u2190 Lower overhead\n",
|
||||
"\u2502 GPU Util: Good (well utilized) \u2502 \u2190 Better efficiency\n",
|
||||
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### Shuffling Overhead Analysis\n",
|
||||
@@ -898,9 +900,9 @@
|
||||
"\n",
|
||||
"Memory Impact:\n",
|
||||
"- No Shuffle: 0 extra memory (sequential access)\n",
|
||||
"- With Shuffle: 8 bytes × dataset_size (store indices)\n",
|
||||
"- With Shuffle: 8 bytes \u00d7 dataset_size (store indices)\n",
|
||||
"\n",
|
||||
"For 50,000 samples: 8 × 50,000 = 400KB extra memory\n",
|
||||
"For 50,000 samples: 8 \u00d7 50,000 = 400KB extra memory\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The key insight: shuffling overhead is typically negligible compared to the actual data loading and tensor operations.\n",
|
||||
@@ -930,16 +932,15 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def analyze_dataloader_performance():\n",
|
||||
" \"\"\"📊 Analyze DataLoader performance characteristics.\"\"\"\n",
|
||||
" print(\"📊 Analyzing DataLoader Performance...\")\n",
|
||||
" \"\"\"\ud83d\udcca Analyze DataLoader performance characteristics.\"\"\"\n",
|
||||
" print(\"\ud83d\udcca Analyzing DataLoader Performance...\")\n",
|
||||
"\n",
|
||||
" import time\n",
|
||||
"\n",
|
||||
" # Create test dataset of varying sizes\n",
|
||||
" sizes = [1000, 5000, 10000]\n",
|
||||
" batch_sizes = [16, 64, 256]\n",
|
||||
"\n",
|
||||
" print(\"\\n🔍 Batch Size vs Loading Time:\")\n",
|
||||
" print(\"\\n\ud83d\udd0d Batch Size vs Loading Time:\")\n",
|
||||
"\n",
|
||||
" for size in sizes:\n",
|
||||
" # Create synthetic dataset\n",
|
||||
@@ -965,7 +966,7 @@
|
||||
" print(f\" Batch size {batch_size:3d}: {elapsed:.3f}s ({throughput:,.0f} samples/sec)\")\n",
|
||||
"\n",
|
||||
" # Analyze shuffle overhead\n",
|
||||
" print(\"\\n🔄 Shuffle Overhead Analysis:\")\n",
|
||||
" print(\"\\n\ud83d\udd04 Shuffle Overhead Analysis:\")\n",
|
||||
"\n",
|
||||
" dataset_size = 10000\n",
|
||||
" features = Tensor(np.random.randn(dataset_size, 50))\n",
|
||||
@@ -992,28 +993,28 @@
|
||||
" print(f\" With shuffle: {time_shuffle:.3f}s\")\n",
|
||||
" print(f\" Shuffle overhead: {shuffle_overhead:.1f}%\")\n",
|
||||
"\n",
|
||||
" print(\"\\n💡 Key Insights:\")\n",
|
||||
" print(\"• Larger batch sizes reduce per-sample overhead\")\n",
|
||||
" print(\"• Shuffle adds minimal overhead for reasonable dataset sizes\")\n",
|
||||
" print(\"• Memory usage scales linearly with batch size\")\n",
|
||||
" print(\"🚀 Production tip: Balance batch size with GPU memory limits\")\n",
|
||||
" print(\"\\n\ud83d\udca1 Key Insights:\")\n",
|
||||
" print(\"\u2022 Larger batch sizes reduce per-sample overhead\")\n",
|
||||
" print(\"\u2022 Shuffle adds minimal overhead for reasonable dataset sizes\")\n",
|
||||
" print(\"\u2022 Memory usage scales linearly with batch size\")\n",
|
||||
" print(\"\ud83d\ude80 Production tip: Balance batch size with GPU memory limits\")\n",
|
||||
"\n",
|
||||
"# analyze_dataloader_performance() # Optional: Run manually for performance insights\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def analyze_memory_usage():\n",
|
||||
" \"\"\"📊 Analyze memory usage patterns in data loading.\"\"\"\n",
|
||||
" print(\"\\n📊 Analyzing Memory Usage Patterns...\")\n",
|
||||
" \"\"\"\ud83d\udcca Analyze memory usage patterns in data loading.\"\"\"\n",
|
||||
" print(\"\\n\ud83d\udcca Analyzing Memory Usage Patterns...\")\n",
|
||||
"\n",
|
||||
" # Memory usage estimation\n",
|
||||
" def estimate_memory_mb(batch_size, feature_size, dtype_bytes=4):\n",
|
||||
" \"\"\"Estimate memory usage for a batch.\"\"\"\n",
|
||||
" return (batch_size * feature_size * dtype_bytes) / (1024 * 1024)\n",
|
||||
"\n",
|
||||
" print(\"\\n💾 Memory Usage by Batch Configuration:\")\n",
|
||||
" print(\"\\n\ud83d\udcbe Memory Usage by Batch Configuration:\")\n",
|
||||
"\n",
|
||||
" feature_sizes = [784, 3072, 50176] # MNIST, CIFAR-10, ImageNet-like\n",
|
||||
" feature_names = [\"MNIST (28×28)\", \"CIFAR-10 (32×32×3)\", \"ImageNet (224×224×1)\"]\n",
|
||||
" feature_names = [\"MNIST (28\u00d728)\", \"CIFAR-10 (32\u00d732\u00d73)\", \"ImageNet (224\u00d7224\u00d71)\"]\n",
|
||||
" batch_sizes = [1, 32, 128, 512]\n",
|
||||
"\n",
|
||||
" for feature_size, name in zip(feature_sizes, feature_names):\n",
|
||||
@@ -1022,13 +1023,13 @@
|
||||
" memory_mb = estimate_memory_mb(batch_size, feature_size)\n",
|
||||
" print(f\" Batch {batch_size:3d}: {memory_mb:6.1f} MB\")\n",
|
||||
"\n",
|
||||
" print(\"\\n🎯 Memory Trade-offs:\")\n",
|
||||
" print(\"• Larger batches: More memory, better GPU utilization\")\n",
|
||||
" print(\"• Smaller batches: Less memory, more noisy gradients\")\n",
|
||||
" print(\"• Sweet spot: Usually 32-128 depending on model size\")\n",
|
||||
" print(\"\\n\ud83c\udfaf Memory Trade-offs:\")\n",
|
||||
" print(\"\u2022 Larger batches: More memory, better GPU utilization\")\n",
|
||||
" print(\"\u2022 Smaller batches: Less memory, more noisy gradients\")\n",
|
||||
" print(\"\u2022 Sweet spot: Usually 32-128 depending on model size\")\n",
|
||||
"\n",
|
||||
" # Demonstrate actual memory usage with our tensors\n",
|
||||
" print(\"\\n🔬 Actual Tensor Memory Usage:\")\n",
|
||||
" print(\"\\n\ud83d\udd2c Actual Tensor Memory Usage:\")\n",
|
||||
"\n",
|
||||
" # Create different sized tensors\n",
|
||||
" tensor_small = Tensor(np.random.randn(32, 784)) # Small batch\n",
|
||||
@@ -1038,9 +1039,9 @@
|
||||
" small_bytes = tensor_small.data.nbytes\n",
|
||||
" large_bytes = tensor_large.data.nbytes\n",
|
||||
"\n",
|
||||
" print(f\" Small batch (32×784): {small_bytes / 1024:.1f} KB\")\n",
|
||||
" print(f\" Large batch (512×784): {large_bytes / 1024:.1f} KB\")\n",
|
||||
" print(f\" Ratio: {large_bytes / small_bytes:.1f}×\")\n",
|
||||
" print(f\" Small batch (32\u00d7784): {small_bytes / 1024:.1f} KB\")\n",
|
||||
" print(f\" Large batch (512\u00d7784): {large_bytes / 1024:.1f} KB\")\n",
|
||||
" print(f\" Ratio: {large_bytes / small_bytes:.1f}\u00d7\")\n",
|
||||
"\n",
|
||||
"# analyze_memory_usage() # Optional: Run manually for memory insights"
|
||||
]
|
||||
@@ -1072,8 +1073,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def test_training_integration():\n",
|
||||
" \"\"\"🔬 Test DataLoader integration with training workflow.\"\"\"\n",
|
||||
" print(\"🔬 Integration Test: Training Workflow...\")\n",
|
||||
" \"\"\"\ud83d\udd2c Test DataLoader integration with training workflow.\"\"\"\n",
|
||||
" print(\"\ud83d\udd2c Integration Test: Training Workflow...\")\n",
|
||||
"\n",
|
||||
" # Create a realistic dataset\n",
|
||||
" num_samples = 1000\n",
|
||||
@@ -1112,12 +1113,12 @@
|
||||
" train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n",
|
||||
" val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)\n",
|
||||
"\n",
|
||||
" print(f\"📊 Dataset splits:\")\n",
|
||||
" print(f\"\ud83d\udcca Dataset splits:\")\n",
|
||||
" print(f\" Training: {len(train_dataset)} samples, {len(train_loader)} batches\")\n",
|
||||
" print(f\" Validation: {len(val_dataset)} samples, {len(val_loader)} batches\")\n",
|
||||
"\n",
|
||||
" # Simulate training loop\n",
|
||||
" print(\"\\n🏃 Simulated Training Loop:\")\n",
|
||||
" print(\"\\n\ud83c\udfc3 Simulated Training Loop:\")\n",
|
||||
"\n",
|
||||
" epoch_samples = 0\n",
|
||||
" batch_count = 0\n",
|
||||
@@ -1139,7 +1140,7 @@
|
||||
" # Validate that all samples were seen\n",
|
||||
" assert epoch_samples == len(train_dataset), f\"Expected {len(train_dataset)}, processed {epoch_samples}\"\n",
|
||||
"\n",
|
||||
" print(\"✅ Training integration works correctly!\")"
|
||||
" print(\"\u2705 Training integration works correctly!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1150,7 +1151,7 @@
|
||||
"lines_to_next_cell": 1
|
||||
},
|
||||
"source": [
|
||||
"## 🧪 Module Integration Test\n",
|
||||
"## \ud83e\uddea Module Integration Test\n",
|
||||
"\n",
|
||||
"Final validation that everything works together correctly."
|
||||
]
|
||||
@@ -1173,7 +1174,7 @@
|
||||
" - Functions work together correctly\n",
|
||||
" - Module is ready for integration with TinyTorch\n",
|
||||
" \"\"\"\n",
|
||||
" print(\"🧪 RUNNING MODULE INTEGRATION TEST\")\n",
|
||||
" print(\"\ud83e\uddea RUNNING MODULE INTEGRATION TEST\")\n",
|
||||
" print(\"=\" * 50)\n",
|
||||
"\n",
|
||||
" # Run all unit tests\n",
|
||||
@@ -1188,7 +1189,7 @@
|
||||
" test_training_integration()\n",
|
||||
"\n",
|
||||
" print(\"\\n\" + \"=\" * 50)\n",
|
||||
" print(\"🎉 ALL TESTS PASSED! Module ready for export.\")\n",
|
||||
" print(\"\ud83c\udf89 ALL TESTS PASSED! Module ready for export.\")\n",
|
||||
" print(\"Run: tito module complete 08\")"
|
||||
]
|
||||
},
|
||||
@@ -1213,7 +1214,7 @@
|
||||
"cell_marker": "\"\"\""
|
||||
},
|
||||
"source": [
|
||||
"## 🎯 MODULE SUMMARY: DataLoader\n",
|
||||
"## \ud83c\udfaf MODULE SUMMARY: DataLoader\n",
|
||||
"\n",
|
||||
"Congratulations! You've built a complete data loading pipeline for ML training!\n",
|
||||
"\n",
|
||||
@@ -1222,7 +1223,7 @@
|
||||
"- Created DataLoader with batching, shuffling, and memory-efficient iteration\n",
|
||||
"- Analyzed data pipeline performance and discovered memory/speed trade-offs\n",
|
||||
"- Learned how to apply DataLoader to real datasets (see examples/milestones)\n",
|
||||
"- All tests pass ✅ (validated by `test_module()`)\n",
|
||||
"- All tests pass \u2705 (validated by `test_module()`)\n",
|
||||
"\n",
|
||||
"### Systems Insights Discovered\n",
|
||||
"- **Batch size directly impacts memory usage and training throughput**\n",
|
||||
@@ -1260,4 +1261,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user