Update generated notebooks and package exports

- Regenerate all .ipynb files from fixed .py modules
- Update tinytorch package exports with corrected implementations
- Sync package module index with current 16-module structure

These generated files reflect all the module fixes and ensure consistent
.py ↔ .ipynb conversion with the updated module implementations.
This commit is contained in:
Vijay Janapa Reddi
2025-09-18 16:42:57 -04:00
parent 39b52e077c
commit bfadc82ce6
29 changed files with 7176 additions and 3593 deletions

View File

@@ -2,48 +2,38 @@
"cells": [
{
"cell_type": "markdown",
"id": "39753d39",
"id": "7023e2cc",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Setup - TinyTorch System Configuration\n",
"# Setup - TinyTorch Development Environment Configuration\n",
"\n",
"Welcome to TinyTorch! This setup module configures your personal TinyTorch installation and teaches you the NBGrader workflow.\n",
"Welcome to the Setup module! You'll configure your development environment and master the foundation of professional ML systems development.\n",
"\n",
"## Learning Goals\n",
"- Configure your personal TinyTorch installation with custom information\n",
"- Learn to query system information using Python modules\n",
"- Master the NBGrader workflow: implement → test → export\n",
"- Create functions that become part of your tinytorch package\n",
"- Understand solution blocks, hidden tests, and automated grading\n",
"- Systems understanding: How environment configuration affects ML system reproducibility and performance\n",
"- Core implementation skill: Build system configuration and introspection capabilities\n",
"- Pattern recognition: Understand how professional ML teams manage development environments\n",
"- Framework connection: See how PyTorch handles environment detection and hardware optimization\n",
"- Performance insight: Learn why proper environment setup is critical for ML system performance\n",
"\n",
"## The Big Picture: Why Configuration Matters in ML Systems\n",
"Configuration is the foundation of any production ML system. In this module, you'll learn:\n",
"## Build → Use → Reflect\n",
"1. **Build**: System configuration and environment detection functions\n",
"2. **Use**: Configure your personal TinyTorch installation with environment-aware settings\n",
"3. **Reflect**: Why do ML systems fail when environments differ between development and production?\n",
"\n",
"### 1. **System Awareness**\n",
"Real ML systems need to understand their environment:\n",
"- **Hardware constraints**: Memory, CPU cores, GPU availability\n",
"- **Software dependencies**: Python version, library compatibility\n",
"- **Platform differences**: Linux servers, macOS development, Windows deployment\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how ML systems detect and adapt to their runtime environment\n",
"- Practical capability to build robust configuration systems that work across different platforms\n",
"- Systems insight into why environment reproducibility is critical for ML system reliability\n",
"- Performance consideration of how hardware detection enables automatic optimization choices\n",
"- Connection to production ML systems and how frameworks like PyTorch handle cross-platform deployment\n",
"\n",
"### 2. **Reproducibility**\n",
"Configuration enables reproducible ML:\n",
"- **Environment documentation**: Exactly what system was used\n",
"- **Dependency management**: Precise versions and requirements\n",
"- **Debugging support**: System info helps troubleshoot issues\n",
"\n",
"### 3. **Professional Development**\n",
"Proper configuration shows engineering maturity:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can understand and extend your setup\n",
"- **Maintenance**: Systems can be updated and maintained\n",
"\n",
"### 4. **ML Systems Context**\n",
"This connects to broader ML engineering:\n",
"- **Model deployment**: Different environments need different configs\n",
"- **Monitoring**: System metrics help track performance\n",
"- **Scaling**: Understanding hardware helps optimize training\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch automatically detects CUDA availability and optimizes operations based on hardware - your configuration system enables similar adaptability\n",
" **Performance Note**: Environment detection happens once at startup, but configuration choices affect every operation - design for minimal runtime overhead\n",
"\n",
"Let's build the foundation of your ML systems engineering skills!"
]
@@ -51,7 +41,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0906c03e",
"id": "f5ff455b",
"metadata": {
"nbgrader": {
"grade": false,
@@ -76,7 +66,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b9770db7",
"id": "3b599384",
"metadata": {
"nbgrader": {
"grade": false,
@@ -97,7 +87,7 @@
},
{
"cell_type": "markdown",
"id": "9472c8b2",
"id": "69084c61",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -142,7 +132,7 @@
},
{
"cell_type": "markdown",
"id": "1b8c8a21",
"id": "18fa8bd3",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 2
@@ -201,7 +191,7 @@
},
{
"cell_type": "markdown",
"id": "265e9036",
"id": "a64cb15f",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -250,7 +240,7 @@
},
{
"cell_type": "markdown",
"id": "51a836ed",
"id": "5880f9a8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -300,7 +290,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f3f05c27",
"id": "a1d3408b",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -375,7 +365,7 @@
},
{
"cell_type": "markdown",
"id": "e4610f1e",
"id": "91c0982a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -389,7 +379,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5c88687f",
"id": "39228303",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -442,7 +432,7 @@
},
{
"cell_type": "markdown",
"id": "cc6d0512",
"id": "73694f24",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -526,7 +516,7 @@
},
{
"cell_type": "markdown",
"id": "2f2b6b6c",
"id": "9915b650",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -576,7 +566,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "fa8505d6",
"id": "1fb29328",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -676,7 +666,7 @@
},
{
"cell_type": "markdown",
"id": "4488b12e",
"id": "4f3a0594",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -690,7 +680,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "75338982",
"id": "f4f54c21",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -742,7 +732,7 @@
},
{
"cell_type": "markdown",
"id": "bee27f01",
"id": "730beb32",
"metadata": {},
"source": [
"\"\"\"\n",
@@ -756,7 +746,7 @@
},
{
"cell_type": "markdown",
"id": "9c8b49ed",
"id": "d12fa04e",
"metadata": {
"lines_to_next_cell": 2
},
@@ -808,7 +798,7 @@
},
{
"cell_type": "markdown",
"id": "157477b2",
"id": "995aef75",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -822,7 +812,7 @@
},
{
"cell_type": "markdown",
"id": "35e51ddf",
"id": "54236e9a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -841,7 +831,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "679242e0",
"id": "2c99286f",
"metadata": {
"nbgrader": {
"grade": true,
@@ -886,7 +876,7 @@
},
{
"cell_type": "markdown",
"id": "5d25c2cc",
"id": "0ff6205a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -905,7 +895,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b7fd46f2",
"id": "2e2caf2d",
"metadata": {
"nbgrader": {
"grade": true,
@@ -950,7 +940,7 @@
},
{
"cell_type": "markdown",
"id": "c9b2aadb",
"id": "d09dd37a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -969,7 +959,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "377789ce",
"id": "0e37d1f9",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1014,7 +1004,7 @@
},
{
"cell_type": "markdown",
"id": "fb77549b",
"id": "8894d9eb",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,33 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "27831359",
"id": "0cdfb87f",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Tensor - Core Data Structure and Memory Management\n",
"\n",
"# Tensor - Core Data Structure\n",
"\n",
"Welcome to the Tensor module! This is where TinyTorch really begins. You'll implement the fundamental data structure that powers all ML systems.\n",
"Welcome to the Tensor module! You'll implement the fundamental data structure that powers all neural networks and understand why memory layout determines performance.\n",
"\n",
"## Learning Goals\n",
"- Understand tensors as N-dimensional arrays with ML-specific operations\n",
"- Implement a complete Tensor class with arithmetic operations\n",
"- Handle shape management, data types, and memory layout\n",
"- Build the foundation for neural networks and automatic differentiation\n",
"- Master the NBGrader workflow with comprehensive testing\n",
"- Systems understanding: How tensor memory layout affects cache performance and computational efficiency\n",
"- Core implementation skill: Build a complete Tensor class with shape management and arithmetic operations\n",
"- Pattern recognition: Understand how tensors abstract N-dimensional data for ML algorithms\n",
"- Framework connection: See how your implementation mirrors PyTorch's tensor design and memory model\n",
"- Performance insight: Learn why contiguous memory layout and vectorized operations are critical for ML performance\n",
"\n",
"## Build → Use → Understand\n",
"1. **Build**: Create the Tensor class with core operations\n",
"2. **Use**: Perform tensor arithmetic and transformations\n",
"3. **Understand**: How tensors form the foundation of ML systems"
"## Build → Use → Reflect\n",
"1. **Build**: Complete Tensor class with shape management, broadcasting, and vectorized operations\n",
"2. **Use**: Perform tensor arithmetic and transformations on real multi-dimensional data\n",
"3. **Reflect**: Why does tensor memory layout become the performance bottleneck in large neural networks?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how N-dimensional arrays are stored and manipulated in memory\n",
"- Practical capability to build efficient tensor operations that form the foundation of neural networks\n",
"- Systems insight into why memory access patterns determine whether ML operations run fast or slow\n",
"- Performance consideration of when tensor operations trigger expensive memory copies vs efficient in-place updates\n",
"- Connection to production ML systems and how PyTorch optimizes tensor storage for GPU acceleration\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch tensors automatically choose optimal memory layouts and can seamlessly move between CPU and GPU - your implementation reveals these design decisions\n",
"⚡ **Performance Note**: Non-contiguous tensors can be 10-100x slower than contiguous ones - memory layout is often more important than algorithm choice in ML systems"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "34143596",
"id": "4507c195",
"metadata": {
"nbgrader": {
"grade": false,
@@ -52,7 +63,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "751a45cf",
"id": "265d1e48",
"metadata": {
"nbgrader": {
"grade": false,
@@ -73,7 +84,7 @@
},
{
"cell_type": "markdown",
"id": "92f6dc9a",
"id": "88c697e6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -100,7 +111,7 @@
},
{
"cell_type": "markdown",
"id": "58d9f74f",
"id": "aec03f10",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -140,7 +151,7 @@
},
{
"cell_type": "markdown",
"id": "56bebf87",
"id": "10618511",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -189,7 +200,7 @@
},
{
"cell_type": "markdown",
"id": "5b0db6bd",
"id": "0a094a5e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -221,7 +232,7 @@
},
{
"cell_type": "markdown",
"id": "efeb244f",
"id": "c3782c84",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -254,7 +265,7 @@
},
{
"cell_type": "markdown",
"id": "169712c3",
"id": "70dc2c2f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -280,7 +291,7 @@
},
{
"cell_type": "markdown",
"id": "7cfaac85",
"id": "1f14b908",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -293,7 +304,7 @@
},
{
"cell_type": "markdown",
"id": "47d5ae1c",
"id": "3e7fb347",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -311,7 +322,7 @@
},
{
"cell_type": "markdown",
"id": "16827f1d",
"id": "4634bb2b",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -325,7 +336,7 @@
},
{
"cell_type": "markdown",
"id": "8e4317fe",
"id": "9ed928d8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -340,7 +351,7 @@
},
{
"cell_type": "markdown",
"id": "39f51b9b",
"id": "63bb1768",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -359,7 +370,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f93ba04c",
"id": "9c33d225",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -844,7 +855,7 @@
},
{
"cell_type": "markdown",
"id": "61051594",
"id": "ebe488e4",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -857,7 +868,7 @@
},
{
"cell_type": "markdown",
"id": "36dd2b60",
"id": "99688473",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -873,7 +884,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7d3d0ce0",
"id": "5a5a242e",
"metadata": {
"nbgrader": {
"grade": true,
@@ -922,7 +933,7 @@
},
{
"cell_type": "markdown",
"id": "d9ccfef9",
"id": "7f95e2b9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -938,7 +949,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "91b85f7d",
"id": "811220b2",
"metadata": {
"nbgrader": {
"grade": true,
@@ -991,7 +1002,7 @@
},
{
"cell_type": "markdown",
"id": "89b063fd",
"id": "a86dff46",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -1007,7 +1018,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8789b67a",
"id": "5e9ad983",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1066,7 +1077,7 @@
},
{
"cell_type": "markdown",
"id": "7e736370",
"id": "5ff78dd4",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 0
@@ -1082,7 +1093,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "fd08e8b8",
"id": "c9532a98",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1119,7 +1130,7 @@
},
{
"cell_type": "markdown",
"id": "6d688a30",
"id": "56b30364",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1133,7 +1144,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f90987e7",
"id": "b386ebba",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1172,7 +1183,7 @@
},
{
"cell_type": "markdown",
"id": "0044d4ae",
"id": "5954ec81",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1196,7 +1207,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "40356576",
"id": "32999348",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1244,7 +1255,7 @@
},
{
"cell_type": "markdown",
"id": "650812e1",
"id": "04b22905",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1275,7 +1286,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "52145f83",
"id": "8ababf32",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1344,7 +1355,7 @@
},
{
"cell_type": "markdown",
"id": "3a4c7598",
"id": "d5b97ce5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1358,7 +1369,7 @@
},
{
"cell_type": "markdown",
"id": "32c67caa",
"id": "0078081f",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1377,7 +1388,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c742fbe3",
"id": "251eb1f6",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1422,7 +1433,7 @@
},
{
"cell_type": "markdown",
"id": "3125bada",
"id": "004f4f87",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1441,7 +1452,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5d760412",
"id": "979ad2d3",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1486,7 +1497,7 @@
},
{
"cell_type": "markdown",
"id": "ed7e5939",
"id": "2cc4eb82",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1505,7 +1516,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c9fd7404",
"id": "85d8da31",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1550,7 +1561,7 @@
},
{
"cell_type": "markdown",
"id": "566d317c",
"id": "34a788e9",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,32 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "ef7cb275",
"id": "e771015b",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Activations - Nonlinearity in Neural Networks\n",
"# Activations - Nonlinearity and Neural Network Intelligence\n",
"\n",
"Welcome to the Activations module! This is where neural networks get their power through nonlinearity.\n",
"Welcome to the Activations module! You'll implement the functions that give neural networks their power to learn complex patterns through nonlinearity.\n",
"\n",
"## Learning Goals\n",
"- Understand why activation functions are essential for neural networks\n",
"- Implement the four most important activation functions: ReLU, Sigmoid, Tanh, and Softmax\n",
"- Visualize how activations transform data and enable complex learning\n",
"- See how activations work with layers to build powerful networks\n",
"- Master the NBGrader workflow with comprehensive testing\n",
"- Systems understanding: Why linear operations alone cannot solve complex problems and how nonlinearity enables universal approximation\n",
"- Core implementation skill: Build the four essential activation functions that power modern neural networks\n",
"- Pattern recognition: Understand how different activations affect gradient flow and learning dynamics\n",
"- Framework connection: See how your implementations match PyTorch's optimized activation functions\n",
"- Performance insight: Learn why activation choice affects both forward pass speed and gradient computation efficiency\n",
"\n",
"## Build → Use → Understand\n",
"1. **Build**: Activation functions that add nonlinearity\n",
"2. **Use**: Transform tensors and see immediate results\n",
"3. **Understand**: How nonlinearity enables complex pattern learning"
"## Build → Use → Reflect\n",
"1. **Build**: ReLU, Sigmoid, Tanh, and Softmax activation functions with proper numerical stability\n",
"2. **Use**: Transform real tensor data and observe how different activations affect output distributions\n",
"3. **Reflect**: Why does activation function choice determine whether deep networks can train successfully?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how nonlinear functions enable neural networks to approximate any continuous function\n",
"- Practical capability to implement numerically stable activation functions that avoid overflow and underflow\n",
"- Systems insight into why activation choice affects gradient flow and determines trainable network depth\n",
"- Performance consideration of how activation complexity affects forward and backward pass computational cost\n",
"- Connection to production ML systems and why modern frameworks provide dozens of activation variants\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch implements activations as both functions and modules, with CUDA kernels for GPU acceleration - your implementation reveals the mathematical foundations\n",
"⚡ **Performance Note**: ReLU is popular partly because it's computationally cheap (just max(0,x)), while Softmax requires expensive exponentials - activation choice affects training speed"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a66b7e45",
"id": "1c09fe15",
"metadata": {
"nbgrader": {
"grade": false,
@@ -61,7 +73,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "70c5eb16",
"id": "07467904",
"metadata": {
"nbgrader": {
"grade": false,
@@ -82,7 +94,7 @@
},
{
"cell_type": "markdown",
"id": "f120c05f",
"id": "f31b39f6",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -108,7 +120,7 @@
},
{
"cell_type": "markdown",
"id": "c9e23af1",
"id": "c08c49f0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -152,7 +164,7 @@
},
{
"cell_type": "markdown",
"id": "8141c337",
"id": "3c601ff7",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -162,7 +174,7 @@
},
{
"cell_type": "markdown",
"id": "32f3818a",
"id": "8c14f143",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -208,7 +220,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d84590de",
"id": "e7e0dcc0",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -275,7 +287,7 @@
},
{
"cell_type": "markdown",
"id": "67b4d900",
"id": "d9c3ec30",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -289,7 +301,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "14cecf13",
"id": "0a866f19",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -346,7 +358,7 @@
},
{
"cell_type": "markdown",
"id": "edfce79d",
"id": "deafa56c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -389,7 +401,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e96fbb19",
"id": "28734d10",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -459,7 +471,7 @@
},
{
"cell_type": "markdown",
"id": "3b1411e8",
"id": "d6ea2494",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -473,7 +485,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0a6fc43b",
"id": "9b3a9a9a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -539,7 +551,7 @@
},
{
"cell_type": "markdown",
"id": "c5456e34",
"id": "131f6f66",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -582,7 +594,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2b4d3d36",
"id": "fb9a410a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -651,7 +663,7 @@
},
{
"cell_type": "markdown",
"id": "59097c9c",
"id": "7cf4c98c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -665,7 +677,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c7bb65b9",
"id": "b6d016e9",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -735,7 +747,7 @@
},
{
"cell_type": "markdown",
"id": "166814f3",
"id": "3a0e5d56",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -778,7 +790,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2ea27bab",
"id": "76c5adc6",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -865,7 +877,7 @@
},
{
"cell_type": "markdown",
"id": "877d56e6",
"id": "a98dacd4",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -879,7 +891,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3c5c426d",
"id": "24e45268",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -955,7 +967,7 @@
},
{
"cell_type": "markdown",
"id": "b4cbae56",
"id": "58f5814b",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -975,7 +987,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e972ac59",
"id": "28124e65",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1078,7 +1090,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9aa04f2e",
"id": "b0fb7bdf",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1157,7 +1169,7 @@
},
{
"cell_type": "markdown",
"id": "1f8c25e6",
"id": "d0682464",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1178,7 +1190,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c38ee3ef",
"id": "499438f3",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1394,7 +1406,7 @@
},
{
"cell_type": "markdown",
"id": "c17904e8",
"id": "ea1b444d",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1408,7 +1420,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "25650903",
"id": "325c9567",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1477,7 +1489,7 @@
},
{
"cell_type": "markdown",
"id": "cdf9ac1c",
"id": "3f52be87",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1490,7 +1502,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "244a1548",
"id": "4efc68a2",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1569,7 +1581,7 @@
},
{
"cell_type": "markdown",
"id": "c725a597",
"id": "da80af75",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1583,7 +1595,7 @@
},
{
"cell_type": "markdown",
"id": "2eab90f8",
"id": "f7964faf",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1602,7 +1614,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "677cb4c6",
"id": "b2232e0a",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1647,7 +1659,7 @@
},
{
"cell_type": "markdown",
"id": "d3f1657b",
"id": "e4cdf2fc",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1666,7 +1678,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ba879308",
"id": "10519d7b",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1711,7 +1723,7 @@
},
{
"cell_type": "markdown",
"id": "e7b77fa4",
"id": "c6762585",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1730,7 +1742,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c159289a",
"id": "85ebb6d0",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1775,7 +1787,7 @@
},
{
"cell_type": "markdown",
"id": "92ba9067",
"id": "b20e53f6",
"metadata": {
"cell_marker": "\"\"\""
},

File diff suppressed because it is too large Load Diff

View File

@@ -2,40 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "7d793e6b",
"id": "a3857f56",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Networks - Neural Network Architectures\n",
"# Networks - Complete Multi-Layer Neural Network Architectures\n",
"\n",
"Welcome to the Networks module! This is where we compose layers into complete neural network architectures.\n",
"Welcome to the Networks module! You'll compose individual layers into complete neural network architectures that can solve real-world problems.\n",
"\n",
"## Learning Goals\n",
"- Understand networks as function composition: `f(x) = layer_n(...layer_2(layer_1(x)))`\n",
"- Build the Sequential network architecture for composing layers\n",
"- Create common network patterns like MLPs (Multi-Layer Perceptrons)\n",
"- Visualize network architectures and understand their capabilities\n",
"- Master forward pass inference through complete networks\n",
"- Systems understanding: How function composition creates complex behaviors from simple layer operations\n",
"- Core implementation skill: Build Sequential networks and Multi-Layer Perceptrons (MLPs) with flexible architectures\n",
"- Pattern recognition: Understand how network depth, width, and activation patterns affect learning capability\n",
"- Framework connection: See how your Sequential implementation mirrors PyTorch's nn.Sequential design pattern\n",
"- Performance insight: Learn why network architecture choices dramatically affect training time and memory usage\n",
"\n",
"## Build → Use → Reflect\n",
"1. **Build**: Sequential networks that compose layers into complete architectures\n",
"2. **Use**: Create different network patterns and run inference\n",
"3. **Reflect**: How architecture design affects network behavior and capability\n",
"1. **Build**: Sequential network container that composes layers into complete architectures\n",
"2. **Use**: Create MLPs with different depth/width configurations and test on real classification problems\n",
"3. **Reflect**: Why do deeper networks learn more complex functions, but also become harder to train?\n",
"\n",
"## What You'll Learn\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- How simple layers combine to create complex behaviors\n",
"- The fundamental Sequential architecture pattern\n",
"- How to build MLPs with any number of layers\n",
"- Different network architectures (shallow, deep, wide)\n",
"- How neural networks approximate complex functions"
"- Deep technical understanding of how layer composition enables universal function approximation\n",
"- Practical capability to design and implement neural network architectures for different problem types\n",
"- Systems insight into why network architecture is often more important than algorithm choice for ML performance\n",
"- Performance consideration of how network size affects training speed, memory usage, and convergence behavior\n",
"- Connection to production ML systems and how architectural innovations drive ML breakthroughs\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's nn.Sequential is used throughout production systems because it provides a clean abstraction for complex architectures while maintaining automatic differentiation\n",
"⚡ **Performance Note**: Network depth affects memory linearly but can affect training time exponentially due to gradient flow problems - architecture design is a systems engineering problem"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8843d18f",
"id": "5033c53d",
"metadata": {
"nbgrader": {
"grade": false,
@@ -64,9 +68,9 @@
" from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
"except ImportError:\n",
" # For development, import from local modules\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_tensor'))\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_activations'))\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '04_layers'))\n",
" from tensor_dev import Tensor\n",
" from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
" from layers_dev import Dense"
@@ -75,7 +79,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "56f6a322",
"id": "4e9d8c1b",
"metadata": {
"nbgrader": {
"grade": false,
@@ -96,7 +100,7 @@
},
{
"cell_type": "markdown",
"id": "70d722ef",
"id": "14e22f57",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -123,7 +127,7 @@
},
{
"cell_type": "markdown",
"id": "b864ed9e",
"id": "772c17c6",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -133,7 +137,7 @@
},
{
"cell_type": "markdown",
"id": "e1552c44",
"id": "beba5509",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -183,7 +187,7 @@
},
{
"cell_type": "markdown",
"id": "8bc0faf1",
"id": "6aef7fd5",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -228,7 +232,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ccb18863",
"id": "dfa7c740",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -338,7 +342,7 @@
},
{
"cell_type": "markdown",
"id": "2c8aa05e",
"id": "e18b84ed",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -353,7 +357,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c609fbd8",
"id": "0a1a1e60",
"metadata": {
"nbgrader": {
"grade": true,
@@ -423,7 +427,7 @@
},
{
"cell_type": "markdown",
"id": "1351cd39",
"id": "40b5a362",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -463,7 +467,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f6268ed1",
"id": "91f5ce0f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -542,7 +546,7 @@
},
{
"cell_type": "markdown",
"id": "f604420c",
"id": "b5e81e02",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -557,7 +561,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "24a3f199",
"id": "b0efa060",
"metadata": {
"nbgrader": {
"grade": true,
@@ -641,7 +645,7 @@
},
{
"cell_type": "markdown",
"id": "7b30af1e",
"id": "4a3984d0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -692,7 +696,7 @@
},
{
"cell_type": "markdown",
"id": "9d4d9e74",
"id": "d2b26fd1",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -706,7 +710,7 @@
},
{
"cell_type": "markdown",
"id": "ecf706b8",
"id": "9f2037c6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -720,7 +724,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1e65c159",
"id": "10390c57",
"metadata": {
"nbgrader": {
"grade": true,
@@ -770,7 +774,7 @@
},
{
"cell_type": "markdown",
"id": "266af4b8",
"id": "90c30819",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -784,7 +788,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c89369f1",
"id": "bf46af38",
"metadata": {
"lines_to_next_cell": 1
},
@@ -852,7 +856,7 @@
},
{
"cell_type": "markdown",
"id": "5945253d",
"id": "a388e844",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -864,7 +868,7 @@
},
{
"cell_type": "markdown",
"id": "f0ec0880",
"id": "8ca7f31b",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -898,7 +902,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7a180d93",
"id": "23e52abc",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1009,7 +1013,7 @@
},
{
"cell_type": "markdown",
"id": "0b3b4ad9",
"id": "91f89a62",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1023,7 +1027,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b1dd5233",
"id": "119b8197",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1084,7 +1088,7 @@
},
{
"cell_type": "markdown",
"id": "0ac076d3",
"id": "c2930f31",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1098,7 +1102,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "937de7bc",
"id": "5f2444d3",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1130,7 +1134,7 @@
},
{
"cell_type": "markdown",
"id": "bc421e40",
"id": "fe90b72a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1144,7 +1148,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "12ecacca",
"id": "8be6678f",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1175,7 +1179,7 @@
},
{
"cell_type": "markdown",
"id": "e66ebaac",
"id": "cd7149bf",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1189,7 +1193,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "134b1e04",
"id": "21132a27",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1213,7 +1217,7 @@
},
{
"cell_type": "markdown",
"id": "889a49c9",
"id": "cbfcdea0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1228,7 +1232,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "198c56e8",
"id": "657273bd",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1249,7 +1253,7 @@
},
{
"cell_type": "markdown",
"id": "dcce377d",
"id": "24ae677f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1261,7 +1265,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f676c7e6",
"id": "40039daa",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1301,7 +1305,7 @@
},
{
"cell_type": "markdown",
"id": "48d6d2b6",
"id": "bf401c68",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1322,7 +1326,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b43e9850",
"id": "11e7c8da",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1659,7 +1663,7 @@
},
{
"cell_type": "markdown",
"id": "7e1f9865",
"id": "0227c73e",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1674,7 +1678,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3affc38c",
"id": "5ad2f520",
"metadata": {},
"outputs": [],
"source": [
@@ -1718,7 +1722,7 @@
},
{
"cell_type": "markdown",
"id": "2d151113",
"id": "f0c79c34",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1731,7 +1735,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "eeed4631",
"id": "34c6b651",
"metadata": {},
"outputs": [],
"source": [
@@ -1829,7 +1833,7 @@
},
{
"cell_type": "markdown",
"id": "b185eb71",
"id": "95d4b89e",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1843,7 +1847,7 @@
},
{
"cell_type": "markdown",
"id": "2e07096d",
"id": "b9169010",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1862,7 +1866,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "888d27f4",
"id": "b8965ca2",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1907,7 +1911,7 @@
},
{
"cell_type": "markdown",
"id": "9de602d0",
"id": "941485eb",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1926,7 +1930,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "bf9ef3ba",
"id": "f3e72cf1",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1971,7 +1975,7 @@
},
{
"cell_type": "markdown",
"id": "b2d28ded",
"id": "17cbf89c",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1990,7 +1994,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7aebdd8c",
"id": "e9aa1668",
"metadata": {
"nbgrader": {
"grade": true,
@@ -2035,7 +2039,7 @@
},
{
"cell_type": "markdown",
"id": "a3199d6e",
"id": "91d5f1a1",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,40 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "c503d2cd",
"id": "fa7c1ea0",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# CNN - Convolutional Neural Networks\n",
"# Spatial - Convolutional Networks and Spatial Pattern Recognition\n",
"\n",
"Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer.\n",
"Welcome to the Spatial module! You'll implement convolutional operations that enable neural networks to understand spatial relationships in images and other grid-structured data.\n",
"\n",
"## Learning Goals\n",
"- Understand the convolution operation and its importance in computer vision\n",
"- Implement Conv2D with explicit for-loops to understand the sliding window mechanism\n",
"- Build convolutional layers that can detect spatial patterns in images\n",
"- Compose Conv2D with other layers to build complete convolutional networks\n",
"- See how convolution enables parameter sharing and translation invariance\n",
"- Systems understanding: How convolution operations achieve spatial pattern recognition through parameter sharing and translation invariance\n",
"- Core implementation skill: Build Conv2D layers using explicit sliding window operations to understand the computational mechanics\n",
"- Pattern recognition: Understand how convolutional layers detect hierarchical features from edges to complex objects\n",
"- Framework connection: See how your implementation reveals the design decisions in PyTorch's nn.Conv2d optimizations\n",
"- Performance insight: Learn why convolution is computationally expensive but highly parallelizable, driving modern GPU architecture\n",
"\n",
"## Build → Use → Reflect\n",
"1. **Build**: Conv2D layer using sliding window convolution from scratch\n",
"2. **Use**: Transform images and see feature maps emerge\n",
"3. **Reflect**: How CNNs learn hierarchical spatial patterns\n",
"1. **Build**: Conv2D layer with sliding window convolution, understanding every memory access and computation\n",
"2. **Use**: Transform real image data and visualize how feature maps capture spatial patterns\n",
"3. **Reflect**: Why does convolution enable parameter sharing, and how does this affect model capacity vs efficiency?\n",
"\n",
"## What You'll Learn\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- How convolution works as a sliding window operation\n",
"- Why convolution is perfect for spatial data like images\n",
"- How to build learnable convolutional layers\n",
"- The CNN pipeline: Conv2D → Activation → Flatten → Dense\n",
"- How parameter sharing makes CNNs efficient"
"- Deep technical understanding of how sliding window operations enable spatial pattern detection\n",
"- Practical capability to implement convolutional layers that form the backbone of computer vision systems\n",
"- Systems insight into why convolution is the dominant operation for spatial data and how it affects memory access patterns\n",
"- Performance consideration of how kernel size, stride, and padding choices affect computational cost and memory usage\n",
"- Connection to production ML systems and how frameworks optimize convolution for different hardware architectures\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's Conv2d uses highly optimized implementations like cuDNN that can be 100x faster than naive implementations through algorithm choice and memory layout optimization\n",
"⚡ **Performance Note**: Convolution is O(H×W×C×K²) per output pixel - modern CNNs perform billions of these operations, making optimization critical for real-time applications"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d1c4a516",
"id": "22139121",
"metadata": {
"nbgrader": {
"grade": false,
@@ -74,7 +78,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "53b75cab",
"id": "d6e50408",
"metadata": {
"nbgrader": {
"grade": false,
@@ -95,7 +99,7 @@
},
{
"cell_type": "markdown",
"id": "9276d461",
"id": "c871945c",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -122,7 +126,7 @@
},
{
"cell_type": "markdown",
"id": "185e3b05",
"id": "4b3aa53a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -132,7 +136,7 @@
},
{
"cell_type": "markdown",
"id": "3d58c1db",
"id": "ce77347b",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -178,7 +182,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6366950c",
"id": "d8b77214",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -262,7 +266,7 @@
},
{
"cell_type": "markdown",
"id": "53162414",
"id": "5959b844",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -277,7 +281,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "75c216b7",
"id": "c254e83d",
"metadata": {
"nbgrader": {
"grade": true,
@@ -354,7 +358,7 @@
},
{
"cell_type": "markdown",
"id": "f4ba3786",
"id": "18423fba",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -389,7 +393,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "da405334",
"id": "d6570110",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -482,7 +486,7 @@
},
{
"cell_type": "markdown",
"id": "f82ae9a1",
"id": "9e3697e2",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -497,7 +501,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "553ebf00",
"id": "848f47d9",
"metadata": {
"nbgrader": {
"grade": true,
@@ -565,7 +569,7 @@
},
{
"cell_type": "markdown",
"id": "dfed0a44",
"id": "704e80fb",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -596,7 +600,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d0ff2c1a",
"id": "c5b749de",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -654,7 +658,7 @@
},
{
"cell_type": "markdown",
"id": "cbc93e4f",
"id": "d6707b0b",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -669,7 +673,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "70d1463d",
"id": "3059d1cf",
"metadata": {
"nbgrader": {
"grade": true,
@@ -743,7 +747,7 @@
},
{
"cell_type": "markdown",
"id": "9173d402",
"id": "f14b5f55",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -777,7 +781,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b72f2e3a",
"id": "c2da08e1",
"metadata": {
"nbgrader": {
"grade": true,
@@ -920,7 +924,7 @@
},
{
"cell_type": "markdown",
"id": "d6584cb0",
"id": "6f5d8a23",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -934,7 +938,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0036e921",
"id": "a18cf95b",
"metadata": {
"lines_to_next_cell": 1
},
@@ -960,7 +964,7 @@
},
{
"cell_type": "markdown",
"id": "f41146d6",
"id": "196dabcb",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -974,7 +978,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "799f9922",
"id": "3be056b9",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1000,7 +1004,7 @@
},
{
"cell_type": "markdown",
"id": "aa6fb973",
"id": "bc2cb7fa",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1014,7 +1018,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "84e96bc5",
"id": "ce0f34f7",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1041,7 +1045,7 @@
},
{
"cell_type": "markdown",
"id": "5ac434fa",
"id": "741a52cc",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1056,7 +1060,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "62492857",
"id": "a6ea94b5",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1077,7 +1081,7 @@
},
{
"cell_type": "markdown",
"id": "8d68b2c9",
"id": "8dd23f39",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1089,7 +1093,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ae11a844",
"id": "89cc1415",
"metadata": {},
"outputs": [],
"source": [
@@ -1124,7 +1128,7 @@
},
{
"cell_type": "markdown",
"id": "7f6f5475",
"id": "753d1ee1",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1155,7 +1159,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "fcb9da38",
"id": "6e909dc7",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1440,7 +1444,7 @@
},
{
"cell_type": "markdown",
"id": "98593473",
"id": "c8d6a997",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1454,7 +1458,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "21839d7e",
"id": "f9f5fe7f",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1534,7 +1538,7 @@
},
{
"cell_type": "markdown",
"id": "ae35e45d",
"id": "90b67437",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1548,7 +1552,7 @@
},
{
"cell_type": "markdown",
"id": "80c93858",
"id": "f3e65ed5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1567,7 +1571,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9d06eb38",
"id": "55655394",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1612,7 +1616,7 @@
},
{
"cell_type": "markdown",
"id": "c270f95f",
"id": "9b34b312",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1631,7 +1635,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "85024508",
"id": "2a5d8d3e",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1676,7 +1680,7 @@
},
{
"cell_type": "markdown",
"id": "3a6bc925",
"id": "80fe0ae3",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1695,7 +1699,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "045e7eb6",
"id": "4c392248",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1740,7 +1744,7 @@
},
{
"cell_type": "markdown",
"id": "4bfd09bd",
"id": "71cb4f6c",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,40 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "d0c4657c",
"id": "7182e0d7",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Attention - The Foundation of Modern AI\n",
"# Attention - Sequence Understanding and Dynamic Focus Mechanisms\n",
"\n",
"Welcome to the Attention module! This is where you'll implement the revolutionary mechanism that powers ChatGPT, BERT, GPT-4, and virtually all state-of-the-art AI systems.\n",
"Welcome to the Attention module! You'll implement the mechanism that revolutionized AI by enabling neural networks to dynamically focus on relevant information, powering transformers and modern language models.\n",
"\n",
"## Learning Goals\n",
"- Understand attention as dynamic pattern matching with Query, Key, Value projections\n",
"- Implement scaled dot-product attention from mathematical foundations\n",
"- Master the attention formula that powers all transformer models\n",
"- Create masking utilities for different attention patterns\n",
"- Build the foundation for understanding modern AI architectures\n",
"- Systems understanding: How attention mechanisms solve the sequence modeling bottleneck through O(n²) parallel computation vs O(n) sequential processing\n",
"- Core implementation skill: Build scaled dot-product attention with Query, Key, Value projections and proper masking strategies\n",
"- Pattern recognition: Understand how attention enables global context modeling and why it replaced RNNs in most sequence tasks\n",
"- Framework connection: See how your implementation matches the attention mechanisms in PyTorch's nn.MultiheadAttention\n",
"- Performance insight: Learn why attention's O(n²) memory complexity becomes the bottleneck for long sequences and drives architectural innovations\n",
"\n",
"## Build → Use → Understand\n",
"1. **Build**: Implement the core attention mechanism from scratch using mathematical principles\n",
"2. **Use**: Apply attention to sequence tasks and visualize attention patterns\n",
"3. **Understand**: How attention revolutionized AI by enabling global context modeling\n",
"## Build → Use → Reflect\n",
"1. **Build**: Complete attention mechanism with QKV projections, scaling, masking, and softmax normalization\n",
"2. **Use**: Apply attention to real sequence data and visualize attention patterns to understand what the model focuses on\n",
"3. **Reflect**: Why does attention's parallel computation enable better performance despite higher memory complexity?\n",
"\n",
"## What You'll Learn\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- How attention enables dynamic focus on relevant input parts\n",
"- The mathematical foundation behind all transformer models\n",
"- Why attention is more powerful than fixed convolution kernels\n",
"- How masking enables different attention patterns (causal, padding)\n",
"- The building block that powers ChatGPT, BERT, and modern AI"
"- Deep technical understanding of how attention mechanisms enable dynamic information flow in neural networks\n",
"- Practical capability to implement the core building block of transformer architectures\n",
"- Systems insight into why attention's parallel computation model revolutionized sequence processing\n",
"- Performance consideration of the O(n²) memory scaling that limits transformer context length\n",
"- Connection to production ML systems and how modern frameworks optimize attention computation\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's MultiheadAttention uses optimized CUDA kernels and can apply techniques like Flash Attention to reduce memory usage from O(n²) to O(n)\n",
"⚡ **Performance Note**: Attention computation is memory-bound, not compute-bound - the bottleneck is moving data, not matrix multiplication, which drives modern attention optimizations"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a8f1f31",
"id": "4aeef42c",
"metadata": {
"nbgrader": {
"grade": false,
@@ -70,7 +74,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d4f32b5d",
"id": "074f718d",
"metadata": {
"nbgrader": {
"grade": false,
@@ -91,7 +95,7 @@
},
{
"cell_type": "markdown",
"id": "4b94ac71",
"id": "916ef772",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -121,7 +125,7 @@
},
{
"cell_type": "markdown",
"id": "93f58ed5",
"id": "cffb9ea5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -131,7 +135,7 @@
},
{
"cell_type": "markdown",
"id": "c41f274a",
"id": "dede851e",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -188,7 +192,7 @@
},
{
"cell_type": "markdown",
"id": "3b4234f2",
"id": "62d34174",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -217,7 +221,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6b637f0f",
"id": "e4c653a5",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -310,7 +314,7 @@
},
{
"cell_type": "markdown",
"id": "cdf2f09a",
"id": "9d034bc0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -324,7 +328,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a7b86326",
"id": "74bd3d91",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -383,7 +387,7 @@
},
{
"cell_type": "markdown",
"id": "3a2e9d5b",
"id": "dae197a5",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -406,7 +410,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c9f99c45",
"id": "334bfa3c",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -510,7 +514,7 @@
},
{
"cell_type": "markdown",
"id": "fb43ba1b",
"id": "566c9e65",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -524,7 +528,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b02d3d88",
"id": "d986875f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -578,7 +582,7 @@
},
{
"cell_type": "markdown",
"id": "6ac9fd19",
"id": "8e82457c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -604,7 +608,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f4713d1d",
"id": "ffaa4174",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -755,7 +759,7 @@
},
{
"cell_type": "markdown",
"id": "98e47c6c",
"id": "d34f60fa",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -769,7 +773,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f9a307ba",
"id": "aa48be00",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -836,7 +840,7 @@
},
{
"cell_type": "markdown",
"id": "75028946",
"id": "efde9e75",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -851,7 +855,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "54a532ee",
"id": "13545030",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -925,7 +929,7 @@
},
{
"cell_type": "markdown",
"id": "b8a9e7f9",
"id": "615e962c",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -938,7 +942,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a5b03749",
"id": "2567f9ca",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1090,7 +1094,7 @@
},
{
"cell_type": "markdown",
"id": "53b2ecda",
"id": "a560a3fb",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1105,7 +1109,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "997f75bf",
"id": "dd8c3ac1",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1126,7 +1130,7 @@
},
{
"cell_type": "markdown",
"id": "31eb9e6d",
"id": "88efe19f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1138,7 +1142,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ceadb85f",
"id": "b6a5d75e",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1174,7 +1178,7 @@
},
{
"cell_type": "markdown",
"id": "2e945318",
"id": "cb7e0dde",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1187,7 +1191,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1105393d",
"id": "5f15980d",
"metadata": {},
"outputs": [],
"source": [
@@ -1218,7 +1222,7 @@
},
{
"cell_type": "markdown",
"id": "33da1fad",
"id": "ea83c51b",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1249,7 +1253,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "42784bee",
"id": "3197755e",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1551,7 +1555,7 @@
},
{
"cell_type": "markdown",
"id": "1414fa9c",
"id": "4bcdc5f9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1565,7 +1569,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0c71d011",
"id": "a5ac19a4",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1632,7 +1636,7 @@
},
{
"cell_type": "markdown",
"id": "ead5bf66",
"id": "6d5795e0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1646,7 +1650,7 @@
},
{
"cell_type": "markdown",
"id": "5bc3403e",
"id": "a65f0b71",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1665,7 +1669,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d3c8c0f8",
"id": "d8ccc2fa",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1710,7 +1714,7 @@
},
{
"cell_type": "markdown",
"id": "676c9af9",
"id": "78e8f253",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1729,7 +1733,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d518dca1",
"id": "964d9b62",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1774,7 +1778,7 @@
},
{
"cell_type": "markdown",
"id": "0e374c11",
"id": "3cf2085b",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1793,7 +1797,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3cd25110",
"id": "3e12fb1b",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1838,7 +1842,7 @@
},
{
"cell_type": "markdown",
"id": "75156c5b",
"id": "a339e55a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1897,7 +1901,7 @@
},
{
"cell_type": "markdown",
"id": "fafa1886",
"id": "02cc85f5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1910,7 +1914,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4e393dbc",
"id": "2c0b5e8b",
"metadata": {},
"outputs": [],
"source": [

View File

@@ -2,40 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "7ed0eda4",
"id": "4c9bc6eb",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# DataLoader - Data Loading and Preprocessing\n",
"# DataLoader - Efficient Data Pipeline and Batch Processing Systems\n",
"\n",
"Welcome to the DataLoader module! This is where you'll learn how to efficiently load, process, and manage data for machine learning systems.\n",
"Welcome to the DataLoader module! You'll build the data infrastructure that feeds neural networks, understanding how I/O optimization and memory management determine training speed.\n",
"\n",
"## Learning Goals\n",
"- Understand data pipelines as the foundation of ML systems\n",
"- Implement efficient data loading with memory management and batching\n",
"- Build reusable dataset abstractions for different data types\n",
"- Master the Dataset and DataLoader pattern used in all ML frameworks\n",
"- Learn systems thinking for data engineering and I/O optimization\n",
"- Systems understanding: How data I/O becomes the bottleneck in ML training and why efficient data pipelines are critical for system performance\n",
"- Core implementation skill: Build Dataset and DataLoader classes with batching, shuffling, and memory-efficient iteration patterns\n",
"- Pattern recognition: Understand the universal Dataset/DataLoader abstraction used across all ML frameworks\n",
"- Framework connection: See how your implementation mirrors PyTorch's data loading infrastructure and optimization strategies\n",
"- Performance insight: Learn why data loading parallelization and prefetching are essential for GPU utilization in production systems\n",
"\n",
"## Build → Use → Reflect\n",
"1. **Build**: Create dataset classes and data loaders from scratch\n",
"2. **Use**: Load real datasets and feed them to neural networks\n",
"3. **Reflect**: How data engineering affects system performance and scalability\n",
"1. **Build**: Complete Dataset and DataLoader classes with efficient batching, shuffling, and real dataset support (CIFAR-10)\n",
"2. **Use**: Load large-scale image datasets and feed them to neural networks with proper batch processing\n",
"3. **Reflect**: Why does data loading speed often determine training speed more than model computation?\n",
"\n",
"## What You'll Learn\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- The Dataset pattern for consistent data access\n",
"- How DataLoaders enable efficient batch processing\n",
"- Why batching and shuffling are crucial for ML\n",
"- How to handle datasets larger than memory\n",
"- The connection between data engineering and model performance"
"- Deep technical understanding of how efficient data pipelines enable scalable ML training\n",
"- Practical capability to build data loading systems that handle datasets larger than memory\n",
"- Systems insight into why data engineering is often the limiting factor in ML system performance\n",
"- Performance consideration of how batch size, shuffling, and prefetching affect training throughput and convergence\n",
"- Connection to production ML systems and how frameworks optimize data loading for different storage systems\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's DataLoader uses multiprocessing and memory pinning to overlap data loading with GPU computation, achieving near-zero data loading overhead\n",
"⚡ **Performance Note**: Modern GPUs can process data faster than storage systems can provide it - data loading optimization is critical for hardware utilization in production training"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba43d64a",
"id": "92c9d8b6",
"metadata": {
"nbgrader": {
"grade": false,
@@ -72,7 +76,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "323f370e",
"id": "2959209b",
"metadata": {
"nbgrader": {
"grade": false,
@@ -93,7 +97,7 @@
},
{
"cell_type": "markdown",
"id": "090e541c",
"id": "8f2d9467",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -119,7 +123,7 @@
},
{
"cell_type": "markdown",
"id": "35f1ab6f",
"id": "8b07e46b",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -129,7 +133,7 @@
},
{
"cell_type": "markdown",
"id": "4b0b67da",
"id": "52c9b734",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -175,7 +179,7 @@
},
{
"cell_type": "markdown",
"id": "7a5a1317",
"id": "d07094e6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -214,7 +218,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "bca6f4f4",
"id": "275c4926",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -361,7 +365,7 @@
},
{
"cell_type": "markdown",
"id": "6abf5f08",
"id": "06c34e75",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -376,7 +380,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5d838618",
"id": "7e349589",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -416,7 +420,7 @@
},
{
"cell_type": "markdown",
"id": "bc230f18",
"id": "261ad6cc",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -458,7 +462,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a1fe82b1",
"id": "a7607154",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -598,7 +602,7 @@
},
{
"cell_type": "markdown",
"id": "3cb05b62",
"id": "ec802471",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -613,7 +617,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a7b71862",
"id": "cb2f9065",
"metadata": {
"nbgrader": {
"grade": true,
@@ -722,7 +726,7 @@
},
{
"cell_type": "markdown",
"id": "604935d1",
"id": "a834dfd9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -750,7 +754,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "af0e1781",
"id": "39e77a02",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -874,7 +878,7 @@
},
{
"cell_type": "markdown",
"id": "53f95576",
"id": "b88878e6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -889,7 +893,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f2b118cc",
"id": "417df9df",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -982,7 +986,7 @@
},
{
"cell_type": "markdown",
"id": "7abe6b56",
"id": "480db551",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -997,7 +1001,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2c449e9f",
"id": "2e73cdb0",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1066,7 +1070,7 @@
},
{
"cell_type": "markdown",
"id": "e9337ab3",
"id": "243297c6",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1112,7 +1116,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "af170e61",
"id": "c994c580",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1260,7 +1264,7 @@
},
{
"cell_type": "markdown",
"id": "e9f202bd",
"id": "54d090c1",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1274,7 +1278,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8d2260f6",
"id": "62c32031",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1300,7 +1304,7 @@
},
{
"cell_type": "markdown",
"id": "b346d8cc",
"id": "cbbce516",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1314,7 +1318,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "33733014",
"id": "a0025080",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1342,7 +1346,7 @@
},
{
"cell_type": "markdown",
"id": "2eeffd77",
"id": "dfc685e4",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1356,7 +1360,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7986ce72",
"id": "0cc885b1",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1383,7 +1387,7 @@
},
{
"cell_type": "markdown",
"id": "1932c912",
"id": "4bd59540",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1397,7 +1401,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5ed441eb",
"id": "9c63e6cd",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1425,7 +1429,7 @@
},
{
"cell_type": "markdown",
"id": "f04f6f6d",
"id": "63acc83f",
"metadata": {
"lines_to_next_cell": 0
},
@@ -1433,7 +1437,7 @@
},
{
"cell_type": "markdown",
"id": "197854fa",
"id": "307992df",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1448,7 +1452,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3f7e529a",
"id": "cd73bc81",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1469,7 +1473,7 @@
},
{
"cell_type": "markdown",
"id": "07f35fa5",
"id": "3171e7ee",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1481,7 +1485,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "aa410f1b",
"id": "924540fd",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1518,7 +1522,7 @@
},
{
"cell_type": "markdown",
"id": "e100b490",
"id": "b8b23ef0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1539,7 +1543,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "63d0c188",
"id": "3ac8f7b9",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1829,7 +1833,7 @@
},
{
"cell_type": "markdown",
"id": "c9f01dfc",
"id": "ad2c8bd8",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1844,7 +1848,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0a5b53c7",
"id": "9b50e007",
"metadata": {},
"outputs": [],
"source": [
@@ -1906,7 +1910,7 @@
},
{
"cell_type": "markdown",
"id": "e914873d",
"id": "92ef4498",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1919,7 +1923,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5b11fc21",
"id": "74695654",
"metadata": {},
"outputs": [],
"source": [
@@ -2026,7 +2030,7 @@
},
{
"cell_type": "markdown",
"id": "4fd5f108",
"id": "27bce6e8",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2056,7 +2060,7 @@
},
{
"cell_type": "markdown",
"id": "5e1049ff",
"id": "0abe9e82",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,32 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "92b3ca4e",
"id": "6adb07a3",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Autograd - Automatic Differentiation Engine\n",
"# Autograd - Automatic Differentiation and Computational Graph Engine\n",
"\n",
"Welcome to the Autograd module! This is where TinyTorch becomes truly powerful. You'll implement the automatic differentiation engine that makes neural network training possible.\n",
"Welcome to the Autograd module! You'll implement the automatic differentiation engine that makes neural network training possible by automatically computing gradients through complex computational graphs.\n",
"\n",
"## Learning Goals\n",
"- Understand how automatic differentiation works through computational graphs\n",
"- Implement the Variable class that tracks gradients and operations\n",
"- Build backward propagation for gradient computation\n",
"- Create the foundation for neural network training\n",
"- Master the mathematical concepts behind backpropagation\n",
"- Systems understanding: How computational graphs enable automatic differentiation and why this approach scales to arbitrary network architectures\n",
"- Core implementation skill: Build the Variable class with gradient tracking and implement backward propagation through dynamic computation graphs\n",
"- Pattern recognition: Understand how chain rule application through computational graphs generalizes to any differentiable function\n",
"- Framework connection: See how your implementation mirrors PyTorch's autograd engine and tensor gradient tracking\n",
"- Performance insight: Learn why computational graph memory management and gradient accumulation strategies determine training scalability\n",
"\n",
"## Build → Use → Analyze\n",
"1. **Build**: Create the Variable class and gradient computation system\n",
"2. **Use**: Perform automatic differentiation on complex expressions\n",
"3. **Analyze**: Understand how gradients flow through computational graphs"
"## Build → Use → Reflect\n",
"1. **Build**: Complete automatic differentiation system with Variable class, gradient tracking, and backward propagation\n",
"2. **Use**: Apply autograd to complex mathematical expressions and neural network operations\n",
"3. **Reflect**: Why does automatic differentiation enable ML at scale, and how does graph memory management affect training?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how computational graphs enable automatic gradient computation for arbitrary functions\n",
"- Practical capability to build the gradient computation engine that powers all modern neural network training\n",
"- Systems insight into why automatic differentiation was the breakthrough that enabled deep learning at scale\n",
"- Performance consideration of how computational graph size and memory management affect training efficiency\n",
"- Connection to production ML systems and how frameworks optimize gradient computation and memory usage\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's autograd can handle graphs with millions of nodes and uses sophisticated memory optimization like gradient checkpointing to train models larger than GPU memory\n",
"⚡ **Performance Note**: Gradient computation often requires storing forward activations, leading to memory usage that scales with network depth - this drives innovations like gradient checkpointing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86981ac1",
"id": "94d3e84e",
"metadata": {
"nbgrader": {
"grade": false,
@@ -61,7 +73,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ff88a869",
"id": "04eab79c",
"metadata": {
"nbgrader": {
"grade": false,
@@ -82,7 +94,7 @@
},
{
"cell_type": "markdown",
"id": "7222907d",
"id": "be5faabe",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -108,7 +120,7 @@
},
{
"cell_type": "markdown",
"id": "8a02ef23",
"id": "d3a86486",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -157,7 +169,7 @@
},
{
"cell_type": "markdown",
"id": "5e823746",
"id": "53e62fad",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -167,7 +179,7 @@
},
{
"cell_type": "markdown",
"id": "6e303a4f",
"id": "1ecd12c0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -211,7 +223,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "13109dc3",
"id": "ee3ffee5",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -377,7 +389,7 @@
},
{
"cell_type": "markdown",
"id": "77f953de",
"id": "5724a34e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -391,7 +403,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6ccd8cc3",
"id": "d5796fe9",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -445,7 +457,7 @@
},
{
"cell_type": "markdown",
"id": "c3646dad",
"id": "947ad0da",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -483,7 +495,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "60787376",
"id": "f20b97a8",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -564,7 +576,7 @@
},
{
"cell_type": "markdown",
"id": "c5619cde",
"id": "808eb9e6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -578,7 +590,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "01efaebf",
"id": "9f1227f9",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -633,7 +645,7 @@
},
{
"cell_type": "markdown",
"id": "bf0f8e22",
"id": "96edb2cf",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -663,7 +675,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a6b63801",
"id": "6802a5f1",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -744,7 +756,7 @@
},
{
"cell_type": "markdown",
"id": "e20a55ac",
"id": "640d880d",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -758,7 +770,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cb0b0164",
"id": "0a50cac8",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -813,7 +825,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2d84707a",
"id": "6a002dd6",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -889,7 +901,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1dbcfbf9",
"id": "a46a2b31",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -942,7 +954,7 @@
},
{
"cell_type": "markdown",
"id": "14f27426",
"id": "1308bf8a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -977,7 +989,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c3e9e1c6",
"id": "f0ee8610",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1049,7 +1061,7 @@
},
{
"cell_type": "markdown",
"id": "b6fdb8ff",
"id": "cb9c3cb0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1095,7 +1107,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cebf4490",
"id": "0079d05b",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1187,7 +1199,7 @@
},
{
"cell_type": "markdown",
"id": "a7db2d43",
"id": "fcf76e2a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1219,7 +1231,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ad0ba9dc",
"id": "5778982d",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1584,7 +1596,7 @@
},
{
"cell_type": "markdown",
"id": "565b6497",
"id": "bd66154e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1598,7 +1610,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e8097dec",
"id": "33f08490",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1679,7 +1691,7 @@
},
{
"cell_type": "markdown",
"id": "3e46cb4a",
"id": "008207b4",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1693,7 +1705,7 @@
},
{
"cell_type": "markdown",
"id": "452c9ca2",
"id": "f644dbd6",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1712,7 +1724,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4d334e50",
"id": "1e132f6b",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1757,7 +1769,7 @@
},
{
"cell_type": "markdown",
"id": "a0444f55",
"id": "e2926afd",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1776,7 +1788,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1c0212c6",
"id": "1673160b",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1821,7 +1833,7 @@
},
{
"cell_type": "markdown",
"id": "7a0b2840",
"id": "6c3978f0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1840,7 +1852,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "173b5452",
"id": "9a402475",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1885,7 +1897,7 @@
},
{
"cell_type": "markdown",
"id": "1def65ac",
"id": "c4162dc5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1942,14 +1954,6 @@
"\n",
"**Ready for optimizers?** Your autograd system is now ready for real training!"
]
},
{
"cell_type": "markdown",
"id": "152797f1",
"metadata": {},
"source": [
"\"\"\""
]
}
],
"metadata": {

View File

@@ -2,32 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "d6dc5af9",
"id": "f547fe8d",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Optimizers - Gradient-Based Parameter Updates\n",
"# Optimizers - Gradient-Based Parameter Updates and Training Dynamics\n",
"\n",
"Welcome to the Optimizers module! This is where neural networks learn to improve through intelligent parameter updates.\n",
"Welcome to the Optimizers module! You'll implement the algorithms that use gradients to update neural network parameters, determining how effectively networks learn from data.\n",
"\n",
"## Learning Goals\n",
"- Understand gradient descent and how optimizers use gradients to update parameters\n",
"- Implement SGD with momentum for accelerated convergence\n",
"- Build Adam optimizer with adaptive learning rates\n",
"- Master learning rate scheduling strategies\n",
"- See how optimizers enable effective neural network training\n",
"- Systems understanding: How different optimization algorithms affect convergence speed, memory usage, and training stability\n",
"- Core implementation skill: Build SGD with momentum and Adam optimizer, understanding their mathematical foundations and implementation trade-offs\n",
"- Pattern recognition: Understand how adaptive learning rates and momentum help navigate complex loss landscapes\n",
"- Framework connection: See how your optimizer implementations match PyTorch's optim module design and state management\n",
"- Performance insight: Learn why optimizer choice affects training speed and why Adam uses 3x more memory than SGD\n",
"\n",
"## Build → Use → Analyze\n",
"1. **Build**: Core optimization algorithms (SGD, Adam)\n",
"2. **Use**: Apply optimizers to train neural networks\n",
"3. **Analyze**: Compare optimizer behavior and convergence patterns"
"## Build → Use → Reflect\n",
"1. **Build**: Complete SGD and Adam optimizers with proper state management and learning rate scheduling\n",
"2. **Use**: Train neural networks with different optimizers and compare convergence behavior on real datasets\n",
"3. **Reflect**: Why do some optimizers work better for certain problems, and how does memory usage scale with model size?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how optimization algorithms navigate high-dimensional loss landscapes to find good solutions\n",
"- Practical capability to implement and tune optimizers that determine training success or failure\n",
"- Systems insight into why optimizer choice often matters more than architecture choice for training success\n",
"- Performance consideration of how optimizer memory requirements and computational overhead affect scalable training\n",
"- Connection to production ML systems and why new optimizers continue to be an active area of research\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's Adam implementation includes numerically stable variants and can automatically scale learning rates based on gradient norms to prevent training instability\n",
"⚡ **Performance Note**: Adam stores running averages for every parameter, using 3x the memory of SGD - this memory overhead becomes critical when training large models near GPU memory limits"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ecf7767d",
"id": "385d3f5e",
"metadata": {
"nbgrader": {
"grade": false,
@@ -106,7 +118,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e6056954",
"id": "8a74cb0f",
"metadata": {
"nbgrader": {
"grade": false,
@@ -127,7 +139,7 @@
},
{
"cell_type": "markdown",
"id": "4e30a1f6",
"id": "b7ca005d",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -153,7 +165,7 @@
},
{
"cell_type": "markdown",
"id": "c533761b",
"id": "dedac464",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -191,7 +203,7 @@
},
{
"cell_type": "markdown",
"id": "e465bda2",
"id": "b525d215",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -201,7 +213,7 @@
},
{
"cell_type": "markdown",
"id": "b6095d52",
"id": "5ef63732",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -251,7 +263,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "da9a25b0",
"id": "c45766f9",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -321,7 +333,7 @@
},
{
"cell_type": "markdown",
"id": "8ac173b8",
"id": "0fa5386e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -337,7 +349,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1fa8c42f",
"id": "a5a3820c",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -414,7 +426,7 @@
},
{
"cell_type": "markdown",
"id": "0e54f9ad",
"id": "b4a6ef30",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -471,7 +483,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "643bbae5",
"id": "d80288ca",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -622,7 +634,7 @@
},
{
"cell_type": "markdown",
"id": "31ac7614",
"id": "1b978961",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -638,7 +650,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "271e3872",
"id": "209054a3",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -745,7 +757,7 @@
},
{
"cell_type": "markdown",
"id": "3f913a57",
"id": "3dcc0613",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -794,7 +806,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e795ba61",
"id": "8b2cf8a0",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -967,7 +979,7 @@
},
{
"cell_type": "markdown",
"id": "e62105b5",
"id": "e7add4a0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -979,7 +991,7 @@
},
{
"cell_type": "markdown",
"id": "7faee55a",
"id": "fbb25460",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -995,7 +1007,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "77da7374",
"id": "d3c1d4b0",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1112,7 +1124,7 @@
},
{
"cell_type": "markdown",
"id": "dd778078",
"id": "525718d0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1160,7 +1172,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7c23ae10",
"id": "e02928ee",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1271,7 +1283,7 @@
},
{
"cell_type": "markdown",
"id": "8e286f41",
"id": "7081b052",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1287,7 +1299,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8b09c5aa",
"id": "6f15603f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1395,7 +1407,7 @@
},
{
"cell_type": "markdown",
"id": "465a5a61",
"id": "b63857c4",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1440,7 +1452,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c10135b0",
"id": "edeaace7",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1567,7 +1579,7 @@
},
{
"cell_type": "markdown",
"id": "8a703160",
"id": "adf293b8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1583,7 +1595,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d5b706ab",
"id": "fc3b285b",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1673,7 +1685,7 @@
},
{
"cell_type": "markdown",
"id": "c8314e14",
"id": "d11f9f47",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1708,7 +1720,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "017016a6",
"id": "ac0e2b84",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2405,7 +2417,7 @@
},
{
"cell_type": "markdown",
"id": "f21b66f1",
"id": "3ea0950d",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2421,7 +2433,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "400c914c",
"id": "495e67e6",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2573,7 +2585,7 @@
},
{
"cell_type": "markdown",
"id": "2fee550e",
"id": "5dc43b14",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2597,7 +2609,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4cae95e1",
"id": "9a594463",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -3016,7 +3028,7 @@
},
{
"cell_type": "markdown",
"id": "4ed2e46b",
"id": "edc91910",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -3032,7 +3044,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "32abb52a",
"id": "989b7aba",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -3179,7 +3191,7 @@
},
{
"cell_type": "markdown",
"id": "28d611a2",
"id": "08d52289",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -3202,7 +3214,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6f9d2f21",
"id": "8f9d10cd",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -3416,7 +3428,7 @@
},
{
"cell_type": "markdown",
"id": "405ec7e0",
"id": "8fd73dda",
"metadata": {},
"source": [
"\"\"\"\n",
@@ -3481,7 +3493,7 @@
},
{
"cell_type": "markdown",
"id": "2c9b54d4",
"id": "7f771cb5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -3495,7 +3507,7 @@
},
{
"cell_type": "markdown",
"id": "6f9dc525",
"id": "becee27d",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -3514,7 +3526,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d12f61d6",
"id": "0b76c034",
"metadata": {
"nbgrader": {
"grade": true,
@@ -3559,7 +3571,7 @@
},
{
"cell_type": "markdown",
"id": "91f51d2b",
"id": "2f8edd2d",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -3578,7 +3590,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9e0e7396",
"id": "510b4873",
"metadata": {
"nbgrader": {
"grade": true,
@@ -3623,7 +3635,7 @@
},
{
"cell_type": "markdown",
"id": "6c089957",
"id": "9382e755",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -3642,7 +3654,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5994d78f",
"id": "cf6c2762",
"metadata": {
"nbgrader": {
"grade": true,
@@ -3687,7 +3699,7 @@
},
{
"cell_type": "markdown",
"id": "c8a744ba",
"id": "5a4865e1",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,32 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "40c65585",
"id": "9722eef4",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Training - Complete Neural Network Training Pipeline\n",
"# Training - Complete End-to-End ML Training Infrastructure\n",
"\n",
"Welcome to the Training module! This is where we bring everything together to train neural networks on real data.\n",
"Welcome to the Training module! You'll build the complete training infrastructure that orchestrates data loading, forward passes, loss computation, backpropagation, and optimization into a unified system.\n",
"\n",
"## Learning Goals\n",
"- Understand loss functions and how they measure model performance\n",
"- Implement essential loss functions: MSE, CrossEntropy, and BinaryCrossEntropy\n",
"- Build evaluation metrics for classification and regression\n",
"- Create a complete training loop that orchestrates the entire process\n",
"- Master checkpointing and model persistence for real-world deployment\n",
"- Systems understanding: How training loops coordinate all ML system components and why training orchestration determines system reliability\n",
"- Core implementation skill: Build loss functions, evaluation metrics, and complete training loops with checkpointing and monitoring\n",
"- Pattern recognition: Understand how different loss functions affect learning dynamics and model behavior\n",
"- Framework connection: See how your training loop mirrors PyTorch's training patterns and state management\n",
"- Performance insight: Learn why training loop design affects convergence speed, memory usage, and debugging capability\n",
"\n",
"## Build → Use → Optimize\n",
"1. **Build**: Loss functions, metrics, and training orchestration\n",
"2. **Use**: Train complete models on real datasets\n",
"3. **Optimize**: Analyze training dynamics and improve performance"
"## Build → Use → Reflect\n",
"1. **Build**: Complete training infrastructure with loss functions, metrics, checkpointing, and progress monitoring\n",
"2. **Use**: Train real neural networks on CIFAR-10 and achieve meaningful accuracy on complex visual tasks\n",
"3. **Reflect**: Why does training loop design often determine the success or failure of ML projects?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how training loops orchestrate complex ML systems into reliable, monitorable processes\n",
"- Practical capability to build production-ready training infrastructure with proper error handling and state management\n",
"- Systems insight into why training stability and reproducibility are critical for reliable ML systems\n",
"- Performance consideration of how training loop efficiency affects iteration speed and resource utilization\n",
"- Connection to production ML systems and how modern MLOps platforms build on these training patterns\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: Modern ML training platforms like PyTorch Lightning and Hugging Face Transformers build sophisticated abstractions on top of basic training loops to handle distributed training, mixed precision, and fault tolerance\n",
"⚡ **Performance Note**: Training loop efficiency often matters more than model efficiency for development speed - good training infrastructure accelerates the entire ML development cycle"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "31270c93",
"id": "d79e429d",
"metadata": {
"nbgrader": {
"grade": false,
@@ -79,7 +91,7 @@
},
{
"cell_type": "markdown",
"id": "6e8e28fe",
"id": "2f3fe102",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -89,7 +101,7 @@
},
{
"cell_type": "markdown",
"id": "c9131846",
"id": "d29c83bd",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -150,7 +162,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c5d8a8d0",
"id": "8efa2e22",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -228,7 +240,7 @@
},
{
"cell_type": "markdown",
"id": "b92c8aec",
"id": "0a9c2f6b",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -242,7 +254,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0c53032b",
"id": "531d56c7",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -301,7 +313,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2aee7ae1",
"id": "14074504",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -405,7 +417,7 @@
},
{
"cell_type": "markdown",
"id": "7f25886d",
"id": "42426295",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -419,7 +431,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c3507f41",
"id": "31e5f16a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -476,7 +488,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1fc153fc",
"id": "8b182b10",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -585,7 +597,7 @@
},
{
"cell_type": "markdown",
"id": "67acce3f",
"id": "64b9a59a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -599,7 +611,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d0b63b9d",
"id": "9d3ddb43",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -655,7 +667,7 @@
},
{
"cell_type": "markdown",
"id": "1861dbea",
"id": "40ce7b15",
"metadata": {},
"source": [
"\"\"\"\n",
@@ -709,7 +721,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "90622dc6",
"id": "ff9b65b9",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -806,7 +818,7 @@
},
{
"cell_type": "markdown",
"id": "8a896363",
"id": "11d7f7a9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -820,7 +832,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "da3306d7",
"id": "0fbb7dea",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -876,7 +888,7 @@
},
{
"cell_type": "markdown",
"id": "18f049c2",
"id": "89535c73",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -923,7 +935,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "560a2d73",
"id": "c8e5c58f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1283,7 +1295,7 @@
},
{
"cell_type": "markdown",
"id": "d5af843c",
"id": "c3c15b00",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1297,7 +1309,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "432f193e",
"id": "ba33e0d4",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1350,7 +1362,7 @@
},
{
"cell_type": "markdown",
"id": "79f7e6b4",
"id": "d3b578a7",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1366,7 +1378,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8e40d9cf",
"id": "f9db1638",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1453,7 +1465,7 @@
},
{
"cell_type": "markdown",
"id": "554a2102",
"id": "456150ec",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1486,7 +1498,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "66602940",
"id": "604fbb39",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1689,7 +1701,7 @@
},
{
"cell_type": "markdown",
"id": "f136e224",
"id": "8eb31853",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1703,7 +1715,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "232e80c5",
"id": "ec159c89",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1774,7 +1786,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "16381eea",
"id": "bba90077",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1950,7 +1962,7 @@
},
{
"cell_type": "markdown",
"id": "cc2f03ca",
"id": "1281999e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1964,7 +1976,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "81d4dc22",
"id": "f82a0ee2",
"metadata": {
"nbgrader": {
"grade": false,
@@ -2049,7 +2061,7 @@
},
{
"cell_type": "markdown",
"id": "29e6d19b",
"id": "b29aedd0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2091,7 +2103,7 @@
},
{
"cell_type": "markdown",
"id": "ec2e9914",
"id": "a24eed33",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,33 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "5b866082",
"id": "f571c637",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Compression & Optimization - Making AI Models Efficient\n",
"# Compression - Model Optimization and Efficient Deployment Strategies\n",
"\n",
"Welcome to the Compression module! This is where you'll learn to make neural networks smaller, faster, and more efficient for real-world deployment.\n",
"Welcome to the Compression module! You'll implement techniques that make neural networks smaller, faster, and more efficient for deployment in resource-constrained environments.\n",
"\n",
"## Learning Goals\n",
"- Understand how model size affects deployment and why compression matters\n",
"- Implement magnitude-based pruning to remove unimportant weights\n",
"- Master quantization to reduce memory usage by 75%\n",
"- Build knowledge distillation for training compact models\n",
"- Create structured pruning to optimize network architectures\n",
"- Compare compression techniques and their trade-offs\n",
"- Systems understanding: How model size and computational requirements affect deployment costs, latency, and energy consumption in production systems\n",
"- Core implementation skill: Build pruning, quantization, and knowledge distillation techniques that reduce model footprint while preserving performance\n",
"- Pattern recognition: Understand the accuracy vs efficiency trade-offs that drive deployment decisions in real ML systems\n",
"- Framework connection: See how your compression implementations relate to PyTorch's optimization tools and mobile deployment strategies\n",
"- Performance insight: Learn why compression techniques can improve both inference speed and training efficiency\n",
"\n",
"## Build → Use → Optimize\n",
"1. **Build**: Four compression techniques from scratch\n",
"2. **Use**: Apply compression to real neural networks\n",
"3. **Optimize**: Combine techniques for maximum efficiency gains"
"## Build → Use → Reflect\n",
"1. **Build**: Complete compression toolkit with magnitude pruning, quantization, and knowledge distillation\n",
"2. **Use**: Apply compression to trained neural networks and measure the accuracy vs efficiency trade-offs\n",
"3. **Reflect**: Why do modern ML systems require compression, and how do compression choices affect system design?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how compression techniques reduce computational and memory requirements without destroying learned representations\n",
"- Practical capability to optimize neural networks for deployment in mobile devices, edge systems, and cost-sensitive environments\n",
"- Systems insight into why compression is essential for practical ML deployment and how it affects system architecture decisions\n",
"- Performance consideration of how different compression techniques affect inference speed, memory usage, and accuracy\n",
"- Connection to production ML systems and how compression enables ML deployment at scale\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: Modern mobile AI relies heavily on compression - techniques like quantization can reduce model size by 4x while maintaining accuracy, enabling on-device inference\n",
"⚡ **Performance Note**: Compression often speeds up inference by reducing memory bandwidth requirements, even when computational complexity remains the same - memory is often the bottleneck"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03c71581",
"id": "9a4356b8",
"metadata": {
"nbgrader": {
"grade": false,
@@ -119,7 +130,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "be19795d",
"id": "dc9ecd88",
"metadata": {
"nbgrader": {
"grade": false,
@@ -140,7 +151,7 @@
},
{
"cell_type": "markdown",
"id": "fc9c58ef",
"id": "b083c6be",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -172,7 +183,7 @@
},
{
"cell_type": "markdown",
"id": "c98dab3b",
"id": "942db810",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -215,7 +226,7 @@
},
{
"cell_type": "markdown",
"id": "0a001725",
"id": "6f290fdb",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -225,7 +236,7 @@
},
{
"cell_type": "markdown",
"id": "1f7ff735",
"id": "4f7f7e3c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -283,7 +294,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1628b3d0",
"id": "a2e4583a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -432,7 +443,7 @@
},
{
"cell_type": "markdown",
"id": "7d981751",
"id": "4da32b00",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -446,7 +457,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cd44c559",
"id": "c809bfa4",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -494,7 +505,7 @@
},
{
"cell_type": "markdown",
"id": "a811d217",
"id": "a6ab5d0f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -541,7 +552,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "af44646b",
"id": "781ec53e",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -639,7 +650,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d5171a69",
"id": "d5b5b2d2",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -708,7 +719,7 @@
},
{
"cell_type": "markdown",
"id": "ba8ce687",
"id": "67eeac1a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -722,7 +733,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "70a3f403",
"id": "ac3403ca",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -789,7 +800,7 @@
},
{
"cell_type": "markdown",
"id": "675224ed",
"id": "4b221d5e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -838,7 +849,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "94b8f9df",
"id": "d9b403eb",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -947,7 +958,7 @@
},
{
"cell_type": "markdown",
"id": "1dcc14be",
"id": "aa5fb04f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -961,7 +972,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f5952b2c",
"id": "6d574271",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1029,7 +1040,7 @@
},
{
"cell_type": "markdown",
"id": "88fe95b6",
"id": "8f39cb2a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1090,7 +1101,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "90c97797",
"id": "85a15c4f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1216,7 +1227,7 @@
},
{
"cell_type": "markdown",
"id": "8cdadebd",
"id": "146dd625",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1230,7 +1241,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "889e0573",
"id": "bedc67dc",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1303,7 +1314,7 @@
},
{
"cell_type": "markdown",
"id": "231eeb7e",
"id": "fe8e4551",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1372,7 +1383,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "be540371",
"id": "42116bb5",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1450,7 +1461,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1794ad9d",
"id": "28e78697",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1569,7 +1580,7 @@
},
{
"cell_type": "markdown",
"id": "a6863b72",
"id": "c220e739",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1583,7 +1594,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "103c486f",
"id": "ae8b114a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1655,7 +1666,7 @@
},
{
"cell_type": "markdown",
"id": "8b0b95bb",
"id": "9acd56e7",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1693,7 +1704,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ac52198d",
"id": "cbc8f024",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2011,7 +2022,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "00d6eba6",
"id": "4744531a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2195,7 +2206,7 @@
},
{
"cell_type": "markdown",
"id": "f7273ce6",
"id": "23ee0c71",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2229,7 +2240,7 @@
},
{
"cell_type": "markdown",
"id": "3ef9b566",
"id": "a0634e78",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2243,7 +2254,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a690a9bc",
"id": "ccae2f66",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2362,7 +2373,7 @@
},
{
"cell_type": "markdown",
"id": "0b33b929",
"id": "5cbb9ac0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2376,7 +2387,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "977f9ac8",
"id": "73a257b4",
"metadata": {
"lines_to_next_cell": 0,
"nbgrader": {
@@ -2462,7 +2473,7 @@
},
{
"cell_type": "markdown",
"id": "6584e376",
"id": "5bcb656a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2476,7 +2487,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c18e338a",
"id": "e125e2d5",
"metadata": {
"lines_to_next_cell": 1
},
@@ -2516,7 +2527,7 @@
},
{
"cell_type": "markdown",
"id": "11bf3a74",
"id": "dabe9a89",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2530,7 +2541,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "26b95e4d",
"id": "08d17ed6",
"metadata": {
"lines_to_next_cell": 1
},
@@ -2585,7 +2596,7 @@
},
{
"cell_type": "markdown",
"id": "f9171952",
"id": "1fdbedcf",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2599,7 +2610,7 @@
},
{
"cell_type": "markdown",
"id": "c278155e",
"id": "43341467",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2610,7 +2621,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "bd64cc9c",
"id": "842c2ce0",
"metadata": {
"nbgrader": {
"grade": false,
@@ -2644,7 +2655,7 @@
},
{
"cell_type": "markdown",
"id": "05674b7c",
"id": "a1395c5b",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,33 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "227839f9",
"id": "8cd904bf",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Kernels - Hardware-Optimized ML Operations\n",
"# Kernels - High-Performance Computing and Hardware Optimization\n",
"\n",
"Welcome to the Kernels module! This is where we move beyond NumPy to understand how ML operations are optimized for modern hardware. You'll implement custom kernels that run faster than standard library functions.\n",
"Welcome to the Kernels module! You'll implement high-performance computational kernels that understand how modern hardware works, moving beyond generic libraries to achieve optimal performance.\n",
"\n",
"## Learning Goals\n",
"- Understand why custom kernels matter for ML performance\n",
"- Implement vectorized operations using SIMD principles\n",
"- Master memory-efficient algorithms for better cache utilization\n",
"- Build parallel processing patterns for CPU and GPU-style computing\n",
"- Create performance profiling tools to measure and optimize code\n",
"- Apply kernel optimizations to compressed model operations\n",
"- Systems understanding: How CPU cache hierarchies, SIMD instructions, and memory bandwidth determine ML operation performance\n",
"- Core implementation skill: Build vectorized operations and memory-efficient algorithms that outperform standard library implementations\n",
"- Pattern recognition: Understand how algorithmic choices interact with hardware characteristics to determine real-world performance\n",
"- Framework connection: See how your optimizations relate to the low-level kernels used in PyTorch, cuDNN, and BLAS libraries\n",
"- Performance insight: Learn why kernel optimization often provides larger speedups than algorithmic improvements\n",
"\n",
"## Build → Use → Optimize\n",
"1. **Build**: Custom operations, vectorization, and memory optimization\n",
"2. **Use**: Apply optimized kernels to real ML workloads\n",
"3. **Optimize**: Profile, measure, and improve performance systematically"
"## Build → Use → Reflect\n",
"1. **Build**: Custom vectorized operations, cache-friendly algorithms, and parallel computation patterns\n",
"2. **Use**: Apply optimized kernels to real ML workloads and measure performance improvements\n",
"3. **Reflect**: Why do hardware characteristics often matter more than algorithm choice for ML performance?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how modern hardware executes ML operations and why optimization requires hardware awareness\n",
"- Practical capability to write high-performance code that achieves near-optimal hardware utilization\n",
"- Systems insight into why kernel optimization is critical for production ML systems and how it affects system design\n",
"- Performance consideration of how memory access patterns, vectorization, and parallelization strategies affect computational efficiency\n",
"- Connection to production ML systems and how frameworks achieve performance through hardware-optimized kernel libraries\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: PyTorch's performance comes from libraries like MKL-DNN and cuDNN that implement thousands of hand-optimized kernels for different hardware configurations\n",
"⚡ **Performance Note**: Well-optimized kernels can be 10-100x faster than naive implementations - kernel optimization is often the difference between research code and production systems"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7579c531",
"id": "a167e482",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -101,7 +112,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6b9c38f6",
"id": "65ef6738",
"metadata": {
"nbgrader": {
"grade": false,
@@ -123,7 +134,7 @@
},
{
"cell_type": "markdown",
"id": "50e842ed",
"id": "bf06e66e",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -149,7 +160,7 @@
},
{
"cell_type": "markdown",
"id": "01f5cfdc",
"id": "c5390635",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -214,7 +225,7 @@
},
{
"cell_type": "markdown",
"id": "997065a7",
"id": "c118bae0",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -224,7 +235,7 @@
},
{
"cell_type": "markdown",
"id": "1660946c",
"id": "e8554383",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -272,7 +283,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "321cb698",
"id": "350f872d",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -345,7 +356,7 @@
},
{
"cell_type": "markdown",
"id": "4127de7d",
"id": "cb2ef920",
"metadata": {
"lines_to_next_cell": 0
},
@@ -354,7 +365,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e3fe4259",
"id": "063bb604",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -416,7 +427,7 @@
},
{
"cell_type": "markdown",
"id": "6a44a2f4",
"id": "6ce0e667",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -466,7 +477,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "314997fb",
"id": "07816f91",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -539,7 +550,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4893591a",
"id": "976c3c51",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -613,7 +624,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2c05c2a8",
"id": "5fadf04a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -675,7 +686,7 @@
},
{
"cell_type": "markdown",
"id": "a09efa4b",
"id": "ea8c4b4e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -732,7 +743,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f24e0d10",
"id": "e7b3fa5a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -828,7 +839,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1d9b95f2",
"id": "e3187a08",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -885,7 +896,7 @@
},
{
"cell_type": "markdown",
"id": "9ec746e4",
"id": "ed07feef",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -928,7 +939,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "970e7e00",
"id": "6edf6993",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1028,7 +1039,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1f273f9c",
"id": "342ea26d",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1107,7 +1118,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "414273de",
"id": "4c5426df",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1166,7 +1177,7 @@
},
{
"cell_type": "markdown",
"id": "448d2e32",
"id": "00cbae2e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1208,7 +1219,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a16284ad",
"id": "0afb507b",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1274,7 +1285,7 @@
},
{
"cell_type": "markdown",
"id": "8474a534",
"id": "e287b111",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1324,7 +1335,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "51cc1d21",
"id": "6dbfdf67",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1402,7 +1413,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "eac2577e",
"id": "27b3d44d",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1477,7 +1488,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1ca1198a",
"id": "0529f1fc",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1553,7 +1564,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d5192a40",
"id": "d93b7992",
"metadata": {
"nbgrader": {
"grade": false,
@@ -1660,7 +1671,7 @@
},
{
"cell_type": "markdown",
"id": "2b7d841e",
"id": "5960991f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1700,7 +1711,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "32a3eda8",
"id": "3c791504",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2063,7 +2074,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c61eebb3",
"id": "ee8f530f",
"metadata": {
"nbgrader": {
"grade": false,
@@ -2151,7 +2162,7 @@
},
{
"cell_type": "markdown",
"id": "c26f3510",
"id": "5abe03c8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2191,7 +2202,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "da84b09f",
"id": "f2564cc6",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2554,7 +2565,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3e395528",
"id": "ebde88eb",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2644,7 +2655,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "27a3ad56",
"id": "82e1a372",
"metadata": {
"lines_to_next_cell": 1
},
@@ -2712,7 +2723,7 @@
},
{
"cell_type": "markdown",
"id": "ccbd2c98",
"id": "a1961c94",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2726,7 +2737,7 @@
},
{
"cell_type": "markdown",
"id": "18ca145a",
"id": "4aaba367",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,32 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "327cb098",
"id": "451ae6b3",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Benchmarking - Systematic ML Performance Evaluation\n",
"# Benchmarking - Systematic Performance Analysis and Bottleneck Identification\n",
"\n",
"Welcome to the Benchmarking module! This is where we learn to systematically evaluate ML systems using industry-standard methodology inspired by MLPerf.\n",
"Welcome to the Benchmarking module! You'll build professional benchmarking tools that identify performance bottlenecks and enable data-driven optimization decisions in ML systems.\n",
"\n",
"## Learning Goals\n",
"- Understand the four-component MLPerf benchmarking architecture\n",
"- Implement different benchmark scenarios (latency, throughput, offline)\n",
"- Apply statistical validation for meaningful results\n",
"- Create professional performance reports for ML projects\n",
"- Learn to avoid common benchmarking pitfalls\n",
"- Systems understanding: How systematic performance measurement reveals bottlenecks and guides optimization priorities in complex ML systems\n",
"- Core implementation skill: Build comprehensive benchmarking frameworks with statistical validation and professional reporting\n",
"- Pattern recognition: Understand how different workload patterns (latency vs throughput) require different measurement strategies\n",
"- Framework connection: See how your benchmarking approach mirrors industry standards like MLPerf and production monitoring systems\n",
"- Performance insight: Learn why measurement methodology often matters more than absolute numbers for optimization decisions\n",
"\n",
"## Build → Use → Analyze\n",
"1. **Build**: Benchmarking framework with proper statistical validation\n",
"2. **Use**: Apply systematic evaluation to your TinyTorch models\n",
"3. **Analyze**: Generate professional reports with statistical confidence"
"## Build → Use → Reflect\n",
"1. **Build**: Complete benchmarking suite with MLPerf-inspired scenarios, statistical validation, and professional reporting\n",
"2. **Use**: Apply systematic evaluation to TinyTorch models and identify performance bottlenecks across the entire system\n",
"3. **Reflect**: Why do measurement artifacts often mislead optimization efforts, and how does proper benchmarking guide development?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how to design benchmarks that reveal actionable insights about system performance\n",
"- Practical capability to build measurement infrastructure that guides optimization decisions and tracks system improvements\n",
"- Systems insight into why benchmarking methodology determines the reliability and usefulness of performance data\n",
"- Performance consideration of how measurement overhead and statistical variance affect benchmark validity\n",
"- Connection to production ML systems and how companies use systematic benchmarking to optimize deployment and hardware decisions\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: Companies like Google and Facebook run continuous benchmarking across thousands of models to guide infrastructure investments and optimization priorities\n",
"⚡ **Performance Note**: Poor benchmarking methodology can lead to optimizing the wrong bottlenecks - measurement artifacts often overwhelm real performance differences"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a953c2ab",
"id": "e392090d",
"metadata": {
"nbgrader": {
"grade": false,
@@ -88,7 +100,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0b0d3740",
"id": "9b0e028d",
"metadata": {
"nbgrader": {
"grade": false,
@@ -109,7 +121,7 @@
},
{
"cell_type": "markdown",
"id": "0d97f2f3",
"id": "272f30c5",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -134,7 +146,7 @@
},
{
"cell_type": "markdown",
"id": "22490ba6",
"id": "e8b5bb39",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -180,7 +192,7 @@
},
{
"cell_type": "markdown",
"id": "1db41dfc",
"id": "5ab97147",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -190,7 +202,7 @@
},
{
"cell_type": "markdown",
"id": "2c90652c",
"id": "8fbf6189",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -233,7 +245,7 @@
},
{
"cell_type": "markdown",
"id": "11a9b1bb",
"id": "1c52fdee",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -282,7 +294,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9019c933",
"id": "3f3c2a5f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -543,7 +555,7 @@
},
{
"cell_type": "markdown",
"id": "40ecaab0",
"id": "09ef7933",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -557,7 +569,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "20a00f47",
"id": "cda6af90",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -612,7 +624,7 @@
},
{
"cell_type": "markdown",
"id": "a9380c14",
"id": "92e57b90",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -630,7 +642,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e39fbae6",
"id": "7c718ded",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -822,7 +834,7 @@
},
{
"cell_type": "markdown",
"id": "e195fb04",
"id": "de9f9b7c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -836,7 +848,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9dc6bf1f",
"id": "ad767dfb",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -889,7 +901,7 @@
},
{
"cell_type": "markdown",
"id": "e211f4ed",
"id": "8d9302a8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -904,7 +916,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "75ceed1b",
"id": "13039465",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1115,7 +1127,7 @@
},
{
"cell_type": "markdown",
"id": "821135c9",
"id": "683e02c6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1129,7 +1141,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a55298f2",
"id": "bfdcde9d",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1198,7 +1210,7 @@
},
{
"cell_type": "markdown",
"id": "aee9ca6c",
"id": "f5facb21",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1217,7 +1229,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "32d1c7a6",
"id": "6be85bd0",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1384,7 +1396,7 @@
},
{
"cell_type": "markdown",
"id": "296ab186",
"id": "2e7dbf81",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1398,7 +1410,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b797dfb8",
"id": "d6621e0d",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1463,7 +1475,7 @@
},
{
"cell_type": "markdown",
"id": "0ef632da",
"id": "ffda8fdb",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1476,7 +1488,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8ca5ff00",
"id": "96b443c5",
"metadata": {},
"outputs": [],
"source": [
@@ -1511,7 +1523,7 @@
},
{
"cell_type": "markdown",
"id": "34482cdd",
"id": "3e9e3be0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1525,7 +1537,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a4d13edf",
"id": "6af71a8b",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1635,7 +1647,7 @@
},
{
"cell_type": "markdown",
"id": "b15d6f2c",
"id": "81e24467",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1645,7 +1657,7 @@
},
{
"cell_type": "markdown",
"id": "6e8b654c",
"id": "450e7bcb",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1684,7 +1696,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2ebcffa8",
"id": "c0eda8aa",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2082,7 +2094,7 @@
},
{
"cell_type": "markdown",
"id": "0a086fed",
"id": "6cb65a66",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2096,7 +2108,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0dd27273",
"id": "f0155f16",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2186,7 +2198,7 @@
},
{
"cell_type": "markdown",
"id": "0e44eaac",
"id": "e93080d4",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -2238,7 +2250,7 @@
},
{
"cell_type": "markdown",
"id": "cb0d05e9",
"id": "8dc2a661",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -2,32 +2,44 @@
"cells": [
{
"cell_type": "markdown",
"id": "4a75900d",
"id": "cc284b69",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# MLOps - Production ML Systems\n",
"# MLOps - Production Deployment and Lifecycle Management\n",
"\n",
"Welcome to the MLOps module! This is where we close the loop on the complete ML system lifecycle.\n",
"Welcome to the MLOps module! You'll build the production infrastructure that deploys, monitors, and maintains ML systems over time, completing the full ML systems engineering lifecycle.\n",
"\n",
"## Learning Goals\n",
"- Understand why ML models degrade over time without maintenance\n",
"- Implement performance monitoring and drift detection systems\n",
"- Build automated retraining triggers that use your training pipeline\n",
"- Create model comparison and deployment workflows\n",
"- See how all TinyTorch components work together in production\n",
"- Systems understanding: How ML models degrade in production and why continuous monitoring and maintenance are critical for system reliability\n",
"- Core implementation skill: Build deployment, monitoring, and automated retraining systems that maintain model performance over time\n",
"- Pattern recognition: Understand how data drift, model decay, and system failures affect production ML systems\n",
"- Framework connection: See how your MLOps implementation connects to modern platforms like MLflow, Kubeflow, and cloud ML services\n",
"- Performance insight: Learn why operational concerns often dominate technical concerns in production ML systems\n",
"\n",
"## Build → Use → Deploy\n",
"1. **Build**: Complete MLOps infrastructure for model lifecycle management\n",
"2. **Use**: Deploy and monitor ML systems that automatically respond to issues\n",
"3. **Deploy**: Create production-ready systems that maintain themselves over time"
"## Build → Use → Reflect\n",
"1. **Build**: Complete MLOps infrastructure with deployment, monitoring, drift detection, and automated retraining capabilities\n",
"2. **Use**: Deploy TinyTorch models to production-like environments and observe how they behave over time\n",
"3. **Reflect**: Why do most ML projects fail in production, and how does proper MLOps infrastructure prevent system failures?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how production ML systems fail and what infrastructure prevents these failures\n",
"- Practical capability to build MLOps systems that automatically detect and respond to model degradation\n",
"- Systems insight into why operational complexity often exceeds algorithmic complexity in production ML systems\n",
"- Performance consideration of how monitoring overhead and deployment latency affect user experience\n",
"- Connection to production ML systems and how companies manage thousands of models across different environments\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: Companies like Netflix and Uber run thousands of ML models in production, requiring sophisticated MLOps platforms to manage deployment, monitoring, and retraining at scale\n",
"⚡ **Performance Note**: Production ML systems spend more computational resources on monitoring, logging, and infrastructure than on actual model inference - operational overhead dominates"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "301b7b86",
"id": "517f30eb",
"metadata": {
"nbgrader": {
"grade": false,
@@ -86,7 +98,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "fa5af886",
"id": "0c0721c6",
"metadata": {
"nbgrader": {
"grade": false,
@@ -107,7 +119,7 @@
},
{
"cell_type": "markdown",
"id": "9f2bc130",
"id": "af24c1f9",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -134,7 +146,7 @@
},
{
"cell_type": "markdown",
"id": "b80d8ac3",
"id": "6f8eecea",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -180,7 +192,7 @@
},
{
"cell_type": "markdown",
"id": "100d03ff",
"id": "bd9c565d",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -190,7 +202,7 @@
},
{
"cell_type": "markdown",
"id": "c2295e26",
"id": "cf33b17f",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -228,7 +240,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b63a1631",
"id": "64d044a8",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -496,7 +508,7 @@
},
{
"cell_type": "markdown",
"id": "8ffc14ba",
"id": "18418556",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -510,7 +522,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6b0fb6aa",
"id": "b65f5550",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -587,7 +599,7 @@
},
{
"cell_type": "markdown",
"id": "dc49df4b",
"id": "172ba7f0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -626,7 +638,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6c4f2ad3",
"id": "b1ecdd62",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -812,7 +824,7 @@
},
{
"cell_type": "markdown",
"id": "e13528a0",
"id": "0164fd3d",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -826,7 +838,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "96803933",
"id": "b49b125a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -900,7 +912,7 @@
},
{
"cell_type": "markdown",
"id": "370131ff",
"id": "46a7a098",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -939,7 +951,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d179c01a",
"id": "ae47ae89",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1193,7 +1205,7 @@
},
{
"cell_type": "markdown",
"id": "cbe7091b",
"id": "fa03db7e",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1207,7 +1219,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7a47b5bd",
"id": "438735c2",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1295,7 +1307,7 @@
},
{
"cell_type": "markdown",
"id": "d002c57f",
"id": "582fd415",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1334,7 +1346,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "16c0ae3c",
"id": "cf5cf724",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1603,7 +1615,7 @@
},
{
"cell_type": "markdown",
"id": "ac4a80b4",
"id": "8f2e9d91",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1617,7 +1629,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "94f083a8",
"id": "a2ef7147",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -1701,7 +1713,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d695e5a0",
"id": "b8603916",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1813,7 +1825,7 @@
},
{
"cell_type": "markdown",
"id": "7871390c",
"id": "310290e8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1858,7 +1870,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c7717339",
"id": "4ec9e97a",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -2810,7 +2822,7 @@
},
{
"cell_type": "markdown",
"id": "0f7693d3",
"id": "0efdff22",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -2824,7 +2836,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f099a028",
"id": "4633543f",
"metadata": {
"nbgrader": {
"grade": true,
@@ -2958,7 +2970,7 @@
},
{
"cell_type": "markdown",
"id": "ebe24911",
"id": "67316213",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -3057,7 +3069,7 @@
},
{
"cell_type": "markdown",
"id": "46b6c967",
"id": "fb34dcde",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -3102,7 +3114,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "26273047",
"id": "01ad3257",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -4054,7 +4066,7 @@
},
{
"cell_type": "markdown",
"id": "89984238",
"id": "d60f354c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -4068,7 +4080,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9ad4f35a",
"id": "e54ce678",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
@@ -4186,7 +4198,7 @@
},
{
"cell_type": "markdown",
"id": "27d703ac",
"id": "fe1a5e7a",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -4285,7 +4297,7 @@
},
{
"cell_type": "markdown",
"id": "3f58ba00",
"id": "a7590b95",
"metadata": {
"cell_marker": "\"\"\""
},

View File

@@ -3,7 +3,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e2068d3a",
"id": "1a145aa0",
"metadata": {},
"outputs": [],
"source": [
@@ -12,29 +12,44 @@
},
{
"cell_type": "markdown",
"id": "d1a52eb4",
"id": "97a7ba09",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Module 16: TinyGPT - From Vision to Language\n",
"# TinyGPT - Complete Transformer Architecture and Generative AI Capstone\n",
"\n",
"## Learning Objectives\n",
"By the end of this module, you will:\n",
"1. Build GPT-style transformer models using TinyTorch components\n",
"2. Understand character-level tokenization for language models\n",
"3. Implement multi-head attention mechanisms that enable sequence understanding\n",
"4. Create complete transformer blocks with layer normalization and residual connections\n",
"5. Train autoregressive language models that generate coherent text\n",
"6. Apply ML Systems thinking to understand framework reusability across modalities\n",
"Welcome to the TinyGPT module! You'll build a complete transformer language model from your TinyTorch components, demonstrating how the same ML systems infrastructure enables both computer vision and natural language processing.\n",
"\n",
"Welcome to the culmination of TinyTorch - where we discover that **vision and language models share the same mathematical foundation!**"
"## Learning Goals\n",
"- Systems understanding: How transformer architectures unify different AI modalities and why attention mechanisms scale across problem domains\n",
"- Core implementation skill: Build complete GPT-style models with multi-head attention, positional encoding, and autoregressive generation\n",
"- Pattern recognition: Understand how the same mathematical primitives (attention, normalization, optimization) enable both vision and language AI\n",
"- Framework connection: See how your transformer implementation reveals the design principles behind modern LLMs like GPT and BERT\n",
"- Performance insight: Learn why transformer scaling laws drive modern AI development and hardware design\n",
"\n",
"## Build → Use → Reflect\n",
"1. **Build**: Complete transformer architecture with multi-head attention, positional encoding, and autoregressive training\n",
"2. **Use**: Train TinyGPT on text data and generate coherent language using your fully self-built ML framework\n",
"3. **Reflect**: How do the same mathematical foundations enable both computer vision and language understanding?\n",
"\n",
"## What You'll Achieve\n",
"By the end of this module, you'll understand:\n",
"- Deep technical understanding of how transformer architectures enable general-purpose AI across different modalities\n",
"- Practical capability to build and train complete language models using your own ML framework implementation\n",
"- Systems insight into how framework design enables rapid experimentation and model development across different domains\n",
"- Performance consideration of how attention's O(n²) scaling drives modern architectural innovations and hardware requirements\n",
"- Connection to production ML systems and how transformer architectures became the foundation of modern AI\n",
"\n",
"## Systems Reality Check\n",
"💡 **Production Context**: Modern LLMs like GPT-4 use the same transformer architecture you're building, scaled to billions of parameters with sophisticated distributed training\n",
"⚡ **Performance Note**: Your TinyGPT demonstrates that the same mathematical operations power both computer vision and language AI - unified frameworks enable rapid innovation across domains"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c73dfac",
"id": "86dc83e5",
"metadata": {},
"outputs": [],
"source": [
@@ -60,7 +75,7 @@
},
{
"cell_type": "markdown",
"id": "8343a440",
"id": "06247a84",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -98,7 +113,7 @@
},
{
"cell_type": "markdown",
"id": "6abf9ee0",
"id": "b3f1c993",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -128,7 +143,7 @@
},
{
"cell_type": "markdown",
"id": "008cb8ae",
"id": "27795628",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -141,7 +156,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c420e767",
"id": "80d29be1",
"metadata": {
"lines_to_next_cell": 1
},
@@ -177,7 +192,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "056b03d5",
"id": "015a7c65",
"metadata": {
"lines_to_next_cell": 1
},
@@ -338,7 +353,7 @@
},
{
"cell_type": "markdown",
"id": "419400af",
"id": "eb855a14",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -352,7 +367,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "11fbf68c",
"id": "9f245a52",
"metadata": {},
"outputs": [],
"source": [
@@ -402,7 +417,7 @@
},
{
"cell_type": "markdown",
"id": "ba0e8f70",
"id": "b9188182",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -418,7 +433,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "47630b29",
"id": "1c97022e",
"metadata": {
"lines_to_next_cell": 1
},
@@ -556,7 +571,7 @@
},
{
"cell_type": "markdown",
"id": "c9cbebf1",
"id": "59b29f29",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -570,7 +585,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1aba02c7",
"id": "801929af",
"metadata": {},
"outputs": [],
"source": [
@@ -618,7 +633,7 @@
},
{
"cell_type": "markdown",
"id": "cb03bea4",
"id": "ea90f6c9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -632,7 +647,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "88e73345",
"id": "a310c22c",
"metadata": {
"lines_to_next_cell": 1
},
@@ -749,7 +764,7 @@
},
{
"cell_type": "markdown",
"id": "356095dd",
"id": "b4f483c6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -763,7 +778,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "44eabf28",
"id": "ab71917e",
"metadata": {},
"outputs": [],
"source": [
@@ -820,7 +835,7 @@
},
{
"cell_type": "markdown",
"id": "92ef2c14",
"id": "ab461bcf",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -834,7 +849,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "98d2c7e1",
"id": "ca415a76",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1009,7 +1024,7 @@
},
{
"cell_type": "markdown",
"id": "6d751711",
"id": "d8f239f8",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1023,7 +1038,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0918755a",
"id": "5601578a",
"metadata": {},
"outputs": [],
"source": [
@@ -1082,7 +1097,7 @@
},
{
"cell_type": "markdown",
"id": "75ce6511",
"id": "e11b852b",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1096,7 +1111,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7c86559e",
"id": "1dd75bcf",
"metadata": {
"lines_to_next_cell": 1
},
@@ -1355,7 +1370,7 @@
},
{
"cell_type": "markdown",
"id": "0f169ac6",
"id": "f8fec6c6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1369,7 +1384,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "05dcd296",
"id": "187aeee0",
"metadata": {},
"outputs": [],
"source": [
@@ -1450,7 +1465,7 @@
},
{
"cell_type": "markdown",
"id": "39d5e30f",
"id": "02143ead",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1464,7 +1479,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e82dcc11",
"id": "e5de6c75",
"metadata": {},
"outputs": [],
"source": [
@@ -1624,7 +1639,7 @@
},
{
"cell_type": "markdown",
"id": "995d39a3",
"id": "02c652ed",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
@@ -1638,7 +1653,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "446ad49f",
"id": "2c79c9b2",
"metadata": {},
"outputs": [],
"source": [
@@ -1718,7 +1733,7 @@
},
{
"cell_type": "markdown",
"id": "9772a381",
"id": "e9228286",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1735,7 +1750,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "545d66a7",
"id": "3b72ad23",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1761,7 +1776,7 @@
},
{
"cell_type": "markdown",
"id": "e69435bb",
"id": "ff979829",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1776,7 +1791,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "82cc7c45",
"id": "0cf9891e",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1802,7 +1817,7 @@
},
{
"cell_type": "markdown",
"id": "ce27eedf",
"id": "deab7f0c",
"metadata": {
"cell_marker": "\"\"\""
},
@@ -1817,7 +1832,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3f72639e",
"id": "ff315f7a",
"metadata": {
"nbgrader": {
"grade": true,
@@ -1844,7 +1859,7 @@
},
{
"cell_type": "markdown",
"id": "85645302",
"id": "17a01f54",
"metadata": {},
"source": [
"\"\"\"\n",