mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-27 16:15:51 -05:00
- Fixed test functions to only run when modules executed directly - Added proper __name__ == '__main__' guards to all test calls - Fixed syntax errors from incorrect replacements in Module 13 and 15 - Modules now import properly without executing tests - ProductionBenchmarkingProfiler (Module 14) and ProductionMLSystemProfiler (Module 16) fully working - Other profiler classes present but require full numpy environment to test completely
548 lines
34 KiB
Plaintext
548 lines
34 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "23615e70",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"# Setup - TinyTorch System Configuration\n",
|
|
"\n",
|
|
"Welcome to TinyTorch! This module configures your development environment and establishes professional ML engineering practices.\n",
|
|
"\n",
|
|
"## Learning Goals\n",
|
|
"- Configure personal developer identification for your TinyTorch installation\n",
|
|
"- Query system information for hardware-aware ML development\n",
|
|
"- Master the NBGrader workflow: implement → test → export\n",
|
|
"- Build functions that integrate into your tinytorch package\n",
|
|
"\n",
|
|
"## Why Configuration Matters in ML Systems\n",
|
|
"Every production ML system needs proper configuration:\n",
|
|
"- **Developer attribution**: Professional identification and contact info\n",
|
|
"- **System awareness**: Understanding hardware limitations and capabilities\n",
|
|
"- **Reproducibility**: Documenting exact environment for experiment tracking\n",
|
|
"- **Debugging support**: System specs help troubleshoot performance issues\n",
|
|
"\n",
|
|
"You'll learn to build ML systems that understand their environment and identify their creators."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0ccdc6fe",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "setup-imports",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| default_exp core.setup\n",
|
|
"\n",
|
|
"#| export\n",
|
|
"import sys\n",
|
|
"import platform\n",
|
|
"import psutil\n",
|
|
"import os\n",
|
|
"from typing import Dict, Any"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "fc3cbf79",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "setup-verification",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(\"🔥 TinyTorch Setup Module\")\n",
|
|
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
|
|
"print(f\"Platform: {platform.system()}\")\n",
|
|
"print(\"Ready to configure your TinyTorch installation!\\n\")\n",
|
|
"\n",
|
|
"# Display configuration workflow\n",
|
|
"print(\"Configuration Workflow:\")\n",
|
|
"print(\"Personal Information → System Information → Complete\")\n",
|
|
"print(\"\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "442a82b2",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## Personal Information Configuration\n",
|
|
"\n",
|
|
"### The 5 C's Framework\n",
|
|
"Before we implement, let's understand what we're building through our 5 C's approach:\n",
|
|
"\n",
|
|
"#### Concept\n",
|
|
"\n",
|
|
"What is Personal Information Configuration?\n",
|
|
"Personal information identifies you as the creator of ML systems. Every professional system needs proper attribution - just like Git commits have author info, your TinyTorch installation needs your identity.\n",
|
|
"\n",
|
|
"#### Code Structure\n",
|
|
"\n",
|
|
"What We're Building:\n",
|
|
"```python\n",
|
|
"def personal_info() -> Dict[str, str]: # Returns developer identity\n",
|
|
" return { # Dictionary with required fields\n",
|
|
" 'developer': 'Your Name', # Your actual name\n",
|
|
" 'email': 'your@domain.com', # Contact information\n",
|
|
" 'institution': 'Your Place', # Affiliation\n",
|
|
" 'system_name': 'YourName-Dev', # Unique system identifier\n",
|
|
" 'version': '1.0.0' # Configuration version\n",
|
|
" }\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### Connections\n",
|
|
"\n",
|
|
"Real-World Equivalents:\n",
|
|
"- **Git commits**: Author name and email in every commit\n",
|
|
"- **Docker images**: Maintainer information in container metadata\n",
|
|
"- **Python packages**: Author info in setup.py and pyproject.toml\n",
|
|
"- **ML model cards**: Creator information for model attribution\n",
|
|
"\n",
|
|
"#### Constraints\n",
|
|
"\n",
|
|
"Key Implementation Requirements:\n",
|
|
"- Use your actual information (not placeholder text)\n",
|
|
"- Email must contain @ and domain\n",
|
|
"- System name should be unique and descriptive\n",
|
|
"- All values must be strings, keep version as '1.0.0'\n",
|
|
"\n",
|
|
"#### Context\n",
|
|
"\n",
|
|
"**You're establishing your professional identity in the ML systems world.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c3350854",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "personal-info",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"def personal_info() -> Dict[str, str]:\n",
|
|
" \"\"\"\n",
|
|
" Return personal information for this TinyTorch installation.\n",
|
|
" \n",
|
|
" This function configures your personal TinyTorch installation with your identity.\n",
|
|
" It's the foundation of proper ML engineering practices - every system needs\n",
|
|
" to know who built it and how to contact them.\n",
|
|
" \n",
|
|
" TODO: Implement personal information configuration.\n",
|
|
" \n",
|
|
" STEP-BY-STEP IMPLEMENTATION:\n",
|
|
" 1. Create a dictionary with your personal details\n",
|
|
" 2. Include all required keys: developer, email, institution, system_name, version\n",
|
|
" 3. Use your actual information (not placeholder text)\n",
|
|
" 4. Make system_name unique and descriptive\n",
|
|
" 5. Keep version as '1.0.0' for now\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Dict[str, str]: Personal configuration with developer identity\n",
|
|
" \"\"\"\n",
|
|
" ### BEGIN SOLUTION\n",
|
|
" return {\n",
|
|
" 'developer': 'Student Name',\n",
|
|
" 'email': 'student@university.edu',\n",
|
|
" 'institution': 'University Name',\n",
|
|
" 'system_name': 'StudentName-TinyTorch-Dev',\n",
|
|
" 'version': '1.0.0'\n",
|
|
" }\n",
|
|
" ### END SOLUTION\n",
|
|
"\n",
|
|
"# Test and validate the personal_info function\n",
|
|
"def test_personal_info_comprehensive():\n",
|
|
" \"\"\"Comprehensive test for personal_info function.\"\"\"\n",
|
|
" print(\"🔬 Testing Personal Information Configuration...\")\n",
|
|
" \n",
|
|
" # Test personal_info function\n",
|
|
" personal = personal_info()\n",
|
|
" \n",
|
|
" # Test return type\n",
|
|
" assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
|
|
" \n",
|
|
" # Test required keys\n",
|
|
" required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
|
|
" for key in required_keys:\n",
|
|
" assert key in personal, f\"Dictionary should have '{key}' key\"\n",
|
|
" \n",
|
|
" # Test non-empty values\n",
|
|
" for key, value in personal.items():\n",
|
|
" assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
|
|
" assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
|
|
" \n",
|
|
" # Test email format\n",
|
|
" assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
|
|
" assert '.' in personal['email'], \"Email should contain domain\"\n",
|
|
" \n",
|
|
" # Test version format\n",
|
|
" assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
|
|
" \n",
|
|
" # Test system name (should be unique/personalized)\n",
|
|
" assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
|
|
" \n",
|
|
" print(\"✅ All personal info tests passed!\")\n",
|
|
" print(f\"✅ TinyTorch configured for: {personal['developer']}\")\n",
|
|
" print(f\"✅ Contact: {personal['email']}\")\n",
|
|
" print(f\"✅ System: {personal['system_name']}\")\n",
|
|
" return personal\n",
|
|
"\n",
|
|
"# Run comprehensive test and display results\n",
|
|
"personal_config = test_personal_info_comprehensive()\n",
|
|
"print(\"\\n\" + \"=\"*50)\n",
|
|
"print(\"✅ Personal Information Configuration COMPLETE\")\n",
|
|
"print(\"=\"*50)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2b9a18f2",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## System Information Collection\n",
|
|
"\n",
|
|
"### The 5 C's Framework\n",
|
|
"Before we implement, let's understand what we're building through our 5 C's approach:\n",
|
|
"\n",
|
|
"#### Concept\n",
|
|
"\n",
|
|
"What is System Information Collection?\n",
|
|
"System information detection provides hardware and software specs that ML systems need for performance optimization. Think computer specifications for gaming - ML needs to know what resources are available.\n",
|
|
"\n",
|
|
"#### Code Structure\n",
|
|
"\n",
|
|
"What We're Building:\n",
|
|
"```python\n",
|
|
"def system_info() -> Dict[str, Any]: # Queries system specs\n",
|
|
" return { # Hardware/software details\n",
|
|
" 'python_version': '3.9.7', # Python compatibility\n",
|
|
" 'platform': 'Darwin', # Operating system\n",
|
|
" 'architecture': 'arm64', # CPU architecture\n",
|
|
" 'cpu_count': 8, # Parallel processing cores\n",
|
|
" 'memory_gb': 16.0 # Available RAM in GB\n",
|
|
" }\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### Connections\n",
|
|
"\n",
|
|
"Real-World Equivalents:\n",
|
|
"- **PyTorch**: `torch.get_num_threads()` uses CPU count for optimization\n",
|
|
"- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware\n",
|
|
"- **Scikit-learn**: `n_jobs=-1` uses all available CPU cores\n",
|
|
"- **MLflow**: Documents system environment for experiment reproducibility\n",
|
|
"\n",
|
|
"#### Constraints\n",
|
|
"\n",
|
|
"Key Implementation Requirements:\n",
|
|
"- Use actual system queries (not hardcoded values)\n",
|
|
"- Convert memory from bytes to GB for readability\n",
|
|
"- Round memory to 1 decimal place for clean output\n",
|
|
"- Return proper data types (strings, int, float)\n",
|
|
"\n",
|
|
"#### Context\n",
|
|
"\n",
|
|
"**You're building ML systems that adapt intelligently to their hardware environment.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "579ab563",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1,
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "system-info",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"def system_info() -> Dict[str, Any]:\n",
|
|
" \"\"\"\n",
|
|
" Query and return system information for this TinyTorch installation.\n",
|
|
" \n",
|
|
" This function gathers crucial hardware and software information that affects\n",
|
|
" ML performance, compatibility, and debugging. It's the foundation of \n",
|
|
" hardware-aware ML systems.\n",
|
|
" \n",
|
|
" TODO: Implement system information queries.\n",
|
|
" \n",
|
|
" STEP-BY-STEP IMPLEMENTATION:\n",
|
|
" 1. Get Python version using sys.version_info\n",
|
|
" 2. Get platform using platform.system()\n",
|
|
" 3. Get architecture using platform.machine()\n",
|
|
" 4. Get CPU count using psutil.cpu_count()\n",
|
|
" 5. Get memory using psutil.virtual_memory().total\n",
|
|
" 6. Convert memory from bytes to GB (divide by 1024^3)\n",
|
|
" 7. Return all information in a dictionary\n",
|
|
" \n",
|
|
" EXAMPLE OUTPUT:\n",
|
|
" {\n",
|
|
" 'python_version': '3.9.7',\n",
|
|
" 'platform': 'Darwin', \n",
|
|
" 'architecture': 'arm64',\n",
|
|
" 'cpu_count': 8,\n",
|
|
" 'memory_gb': 16.0\n",
|
|
" }\n",
|
|
" \n",
|
|
" IMPLEMENTATION HINTS:\n",
|
|
" - Use f-string formatting for Python version: f\"{major}.{minor}.{micro}\"\n",
|
|
" - Memory conversion: bytes / (1024^3) = GB\n",
|
|
" - Round memory to 1 decimal place for readability\n",
|
|
" - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)\n",
|
|
" \n",
|
|
" LEARNING CONNECTIONS:\n",
|
|
" - This is like `torch.cuda.is_available()` in PyTorch\n",
|
|
" - Similar to system info in MLflow experiment tracking\n",
|
|
" - Parallels hardware detection in TensorFlow\n",
|
|
" - Foundation for performance optimization in ML systems\n",
|
|
" \n",
|
|
" PERFORMANCE IMPLICATIONS:\n",
|
|
" - cpu_count affects parallel processing capabilities\n",
|
|
" - memory_gb determines maximum model and batch sizes\n",
|
|
" - platform affects file system and process management\n",
|
|
" - architecture influences numerical precision and optimization\n",
|
|
" \"\"\"\n",
|
|
" ### BEGIN SOLUTION\n",
|
|
" # Get Python version\n",
|
|
" version_info = sys.version_info\n",
|
|
" python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
|
|
" \n",
|
|
" # Get platform information\n",
|
|
" platform_name = platform.system()\n",
|
|
" architecture = platform.machine()\n",
|
|
" \n",
|
|
" # Get CPU information\n",
|
|
" cpu_count = psutil.cpu_count()\n",
|
|
" \n",
|
|
" # Get memory information (convert bytes to GB)\n",
|
|
" memory_bytes = psutil.virtual_memory().total\n",
|
|
" memory_gb = round(memory_bytes / (1024**3), 1)\n",
|
|
" \n",
|
|
" return {\n",
|
|
" 'python_version': python_version,\n",
|
|
" 'platform': platform_name,\n",
|
|
" 'architecture': architecture,\n",
|
|
" 'cpu_count': cpu_count,\n",
|
|
" 'memory_gb': memory_gb\n",
|
|
" }\n",
|
|
" ### END SOLUTION"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4584d0e9",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"### 🧪 Unit Test: System Information Query\n",
|
|
"\n",
|
|
"This test validates your `system_info()` function implementation, ensuring it accurately detects and reports hardware and software specifications for performance optimization and debugging."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c94825aa",
|
|
"metadata": {
|
|
"lines_to_next_cell": 2,
|
|
"nbgrader": {
|
|
"grade": true,
|
|
"grade_id": "test-system-info-immediate",
|
|
"locked": true,
|
|
"points": 5,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def test_unit_system_info_basic():\n",
|
|
" \"\"\"Test system_info function implementation.\"\"\"\n",
|
|
" print(\"🔬 Unit Test: System Information...\")\n",
|
|
" \n",
|
|
" # Test system_info function\n",
|
|
" sys_info = system_info()\n",
|
|
" \n",
|
|
" # Test return type\n",
|
|
" assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
|
|
" \n",
|
|
" # Test required keys\n",
|
|
" required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
|
|
" for key in required_keys:\n",
|
|
" assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
|
|
" \n",
|
|
" # Test data types\n",
|
|
" assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
|
|
" assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
|
|
" assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
|
|
" assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
|
|
" assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
|
|
" \n",
|
|
" # Test reasonable values\n",
|
|
" assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
|
|
" assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
|
|
" assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
|
|
" \n",
|
|
" # Test that values are actually queried (not hardcoded)\n",
|
|
" actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
|
|
" assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
|
|
" \n",
|
|
" print(\"✅ System info function tests passed!\")\n",
|
|
" print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")\n",
|
|
"\n",
|
|
"# Run the test\n",
|
|
"test_unit_system_info_basic()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6rkxeeig3li",
|
|
"source": "## 🖥️ ML Systems Foundation: Hardware Awareness & Resource Planning\n\nNow that you've implemented basic system detection, let's build **ML systems engineering intuition**. This section introduces you to thinking like an ML systems engineer - understanding how hardware affects ML capabilities and learning to plan resources for real-world deployments.\n\n### **Learning Outcome**: *\"I understand how my hardware affects ML capabilities\"*\n\n---\n\n## Systems Analysis Tools (Review & Understand)\n\nAs an ML systems engineer, you need tools to analyze hardware capabilities and estimate what models you can realistically run. Below are professional-grade analysis tools - **your job is to run them, understand the output, and develop intuition** about hardware-ML relationships.",
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "cpmzmjj317s",
|
|
"source": "#| export\nimport time\nimport numpy as np\n\nclass MLSystemAnalyzer:\n \"\"\"\n Professional ML systems analysis toolkit.\n \n This class provides tools to analyze hardware capabilities and estimate\n what ML workloads your system can handle. Used by ML engineers to plan\n deployments and understand system limitations.\n \"\"\"\n \n def __init__(self):\n self.sys_info = system_info()\n self.analysis_cache = {}\n \n def analyze_ml_capabilities(self):\n \"\"\"\n Analyze this system's ML capabilities and provide professional estimates.\n \n Returns comprehensive analysis of what this hardware can handle for ML workloads.\n Based on industry rules of thumb and production experience.\n \"\"\"\n memory_gb = self.sys_info['memory_gb']\n cpu_cores = self.sys_info['cpu_count']\n \n # Industry rule of thumb: 1M parameters ≈ 4MB memory (float32)\n # Conservative estimate accounts for gradients, optimizer states\n max_model_params = int(memory_gb * 1_000_000 * 0.25) # 25% of memory for model\n \n # Batch size recommendations based on memory\n small_batch = max(1, int(memory_gb * 2)) # 2 samples per GB\n medium_batch = max(1, int(memory_gb * 8)) # 8 samples per GB \n large_batch = max(1, int(memory_gb * 32)) # 32 samples per GB\n \n # Training time estimates (very rough)\n epochs_per_hour_estimate = max(1, cpu_cores * 2)\n \n analysis = {\n 'system_class': self._classify_system(memory_gb, cpu_cores),\n 'max_model_parameters': max_model_params,\n 'recommended_batch_sizes': {\n 'conservative': small_batch,\n 'balanced': medium_batch, \n 'aggressive': large_batch\n },\n 'estimated_training_speed': f\"~{epochs_per_hour_estimate} epochs/hour\",\n 'memory_allocation': {\n 'model_weights': f\"{memory_gb * 0.25:.1f} GB\",\n 'gradients': f\"{memory_gb * 0.25:.1f} GB\", \n 'optimizer_state': f\"{memory_gb * 0.25:.1f} GB\",\n 'system_overhead': f\"{memory_gb * 0.25:.1f} GB\"\n },\n 'production_readiness': self._assess_production_readiness(memory_gb, cpu_cores)\n }\n \n return analysis\n \n def compare_with_famous_models(self):\n \"\"\"\n Compare your system with requirements for famous ML models.\n \n Helps students understand what they can realistically run and what\n requires cloud resources or specialized hardware.\n \"\"\"\n max_params = self.analyze_ml_capabilities()['max_model_parameters']\n \n famous_models = {\n 'Tiny Model (Educational)': {\n 'parameters': 100_000,\n 'memory_needed_gb': 0.4,\n 'example': 'Simple MNIST classifier'\n },\n 'Small Model (Prototype)': {\n 'parameters': 1_000_000,\n 'memory_needed_gb': 4,\n 'example': 'Small ResNet, basic NLP model'\n },\n 'Medium Model (Research)': {\n 'parameters': 10_000_000,\n 'memory_needed_gb': 40,\n 'example': 'ResNet-50, small transformer'\n },\n 'Large Model (Production)': {\n 'parameters': 100_000_000,\n 'memory_needed_gb': 400,\n 'example': 'Large transformer, computer vision production'\n },\n 'GPT-3 (175B)': {\n 'parameters': 175_000_000_000,\n 'memory_needed_gb': 700_000, # 700 TB!\n 'example': 'OpenAI GPT-3, requires massive clusters'\n },\n 'GPT-4 (Estimated 1.8T)': {\n 'parameters': 1_800_000_000_000,\n 'memory_needed_gb': 7_200_000, # 7.2 PB!\n 'example': 'OpenAI GPT-4, cutting-edge research'\n }\n }\n \n analysis = {}\n for model_name, specs in famous_models.items():\n can_run = specs['parameters'] <= max_params\n analysis[model_name] = {\n 'can_run': can_run,\n 'parameters': f\"{specs['parameters']:,}\",\n 'memory_needed': f\"{specs['memory_needed_gb']:,.1f} GB\",\n 'example': specs['example'],\n 'verdict': '✅ Can run' if can_run else '❌ Need cloud/cluster'\n }\n \n return analysis\n \n def estimate_cloud_costs(self, model_parameters, training_hours=24):\n \"\"\"\n Estimate cloud costs for training models that don't fit on local hardware.\n \n Helps students understand the economics of ML systems and why optimization matters.\n \"\"\"\n memory_needed_gb = model_parameters * 4 / (1024**3) # Convert to GB\n \n # Rough AWS pricing (changes frequently, this is educational)\n if memory_needed_gb <= 32:\n instance_type = \"m5.2xlarge\"\n cost_per_hour = 0.384\n elif memory_needed_gb <= 64:\n instance_type = \"m5.4xlarge\" \n cost_per_hour = 0.768\n elif memory_needed_gb <= 128:\n instance_type = \"m5.8xlarge\"\n cost_per_hour = 1.536\n else:\n instance_type = \"p3.8xlarge (GPU cluster)\"\n cost_per_hour = 12.24\n \n total_cost = cost_per_hour * training_hours\n \n return {\n 'recommended_instance': instance_type,\n 'cost_per_hour': f\"${cost_per_hour:.2f}\",\n 'total_cost_24h': f\"${total_cost:.2f}\",\n 'memory_needed': f\"{memory_needed_gb:.1f} GB\",\n 'cost_comparison': f\"{total_cost/100:.1f}x more than local development\"\n }\n \n def _classify_system(self, memory_gb, cpu_cores):\n \"\"\"Classify the system type for ML workloads.\"\"\"\n if memory_gb >= 64 and cpu_cores >= 16:\n return \"High-end workstation\"\n elif memory_gb >= 16 and cpu_cores >= 8:\n return \"Development machine\"\n elif memory_gb >= 8 and cpu_cores >= 4:\n return \"Basic laptop\"\n else:\n return \"Limited system\"\n \n def _assess_production_readiness(self, memory_gb, cpu_cores):\n \"\"\"Assess if system is suitable for different types of ML work.\"\"\"\n if memory_gb >= 32:\n return \"Production prototyping capable\"\n elif memory_gb >= 16:\n return \"Research and development ready\"\n elif memory_gb >= 8:\n return \"Educational and small experiments\"\n else:\n return \"Very limited ML capabilities\"",
|
|
"metadata": {},
|
|
"execution_count": null,
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d1olb2n5jhg",
|
|
"source": "### 🎯 Learning Activity 1: Hardware Discovery (Review & Understand)\n\n**Goal**: Understand your system's ML capabilities and develop hardware intuition.\n\nRun the systems analysis tools below and **interpret the results**. Your job is to understand what the numbers mean for ML development.",
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "2fp2ar91bb1",
|
|
"source": "# Initialize the ML systems analyzer\nanalyzer = MLSystemAnalyzer()\n\n# Analyze your system's ML capabilities \nprint(\"🖥️ ML SYSTEMS ANALYSIS: Your Hardware Capabilities\")\nprint(\"=\" * 60)\n\ncapabilities = analyzer.analyze_ml_capabilities()\n\nprint(f\"🏷️ System Classification: {capabilities['system_class']}\")\nprint(f\"🧠 Max Model Parameters: {capabilities['max_model_parameters']:,}\")\nprint(f\"⚡ Production Readiness: {capabilities['production_readiness']}\")\nprint(f\"🏃 Estimated Training Speed: {capabilities['estimated_training_speed']}\")\n\nprint(f\"\\n📊 Recommended Batch Sizes:\")\nfor level, size in capabilities['recommended_batch_sizes'].items():\n print(f\" {level.capitalize()}: {size}\")\n\nprint(f\"\\n💾 Memory Allocation Breakdown:\")\nfor component, allocation in capabilities['memory_allocation'].items():\n print(f\" {component.replace('_', ' ').title()}: {allocation}\")\n\nprint(\"\\n\" + \"=\" * 60)\nprint(\"💡 SYSTEMS INSIGHT: These numbers determine what you can realistically build!\")\nprint(\" - Model parameters directly affect memory usage\") \nprint(\" - Batch size affects training speed and memory\")\nprint(\" - Your hardware constrains your ML possibilities\")",
|
|
"metadata": {},
|
|
"execution_count": null,
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ykkoq6677wp",
|
|
"source": "### 🎯 Learning Activity 2: Compare with Famous Models (Review & Understand)\n\n**Goal**: Understand how your system compares to real-world ML models and why cloud computing matters.",
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "4xye4btvym7",
|
|
"source": "# Compare your system with famous ML models\nprint(\"🌟 FAMOUS MODEL COMPARISON: What Can You Run?\")\nprint(\"=\" * 60)\n\nmodel_comparison = analyzer.compare_with_famous_models()\n\nfor model_name, specs in model_comparison.items():\n print(f\"\\n📱 {model_name}\")\n print(f\" Parameters: {specs['parameters']}\")\n print(f\" Memory Needed: {specs['memory_needed']}\")\n print(f\" Example: {specs['example']}\")\n print(f\" {specs['verdict']}\")\n\nprint(\"\\n\" + \"=\" * 60)\nprint(\"💡 SYSTEMS INSIGHT: Notice the massive jump from research to production models!\")\nprint(\" - Your laptop: Good for learning and small experiments\")\nprint(\" - Production models: Require massive cloud infrastructure\") \nprint(\" - GPT-3/GPT-4: Need entire data centers!\")\n\n# Show cloud cost estimates for a model that doesn't fit\nprint(f\"\\n💰 CLOUD COST EXAMPLE: Training a 100M parameter model\")\ncost_analysis = analyzer.estimate_cloud_costs(100_000_000, 24)\nprint(f\" Recommended Instance: {cost_analysis['recommended_instance']}\")\nprint(f\" Cost per Hour: {cost_analysis['cost_per_hour']}\")\nprint(f\" 24-Hour Training Cost: {cost_analysis['total_cost_24h']}\")\nprint(f\" Memory Required: {cost_analysis['memory_needed']}\")\n\nprint(f\"\\n🎯 KEY TAKEAWAY: This is why ML engineers optimize for efficiency!\")\nprint(f\" Every parameter costs money in production 💸\")",
|
|
"metadata": {},
|
|
"execution_count": null,
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8097da88",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## Module Summary: TinyTorch Setup Complete\n",
|
|
"\n",
|
|
"Congratulations! You've successfully configured your TinyTorch development environment and established professional ML engineering practices.\n",
|
|
"\n",
|
|
"### What You've Accomplished\n",
|
|
"✅ **Personal Configuration**: Established developer identity and system attribution \n",
|
|
"✅ **System Information**: Built hardware-aware ML system foundation \n",
|
|
"✅ **Testing Integration**: Implemented comprehensive validation for both functions \n",
|
|
"✅ **Professional Workflow**: Mastered NBGrader solution blocks and testing \n",
|
|
"\n",
|
|
"Your TinyTorch installation is now properly configured with:\n",
|
|
"- **Developer attribution** for professional collaboration\n",
|
|
"- **Hardware detection** for performance optimization\n",
|
|
"- **Tested functions** ready for package integration\n",
|
|
"\n",
|
|
"### Key ML Systems Concepts Learned\n",
|
|
"- **Configuration management**: Professional setup and attribution standards\n",
|
|
"- **Hardware awareness**: System specs affect ML performance and capabilities\n",
|
|
"- **Testing practices**: Comprehensive validation ensures reliability\n",
|
|
"- **Package development**: Functions become part of production codebase\n",
|
|
"\n",
|
|
"### Next Steps\n",
|
|
"1. **Export your work**: Use `tito module export 01_setup` to integrate with TinyTorch\n",
|
|
"2. **Verify integration**: Test that your functions work in the tinytorch package\n",
|
|
"3. **Ready for tensors**: Move on to building the fundamental ML data structure\n",
|
|
"\n",
|
|
"**You've built the foundation - now let's construct the ML system on top of it!**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ukqppvlrr3h",
|
|
"source": "## 🤔 ML Systems Thinking: Reflection Questions\n\nNow that you've built system configuration tools, reflect on how this foundation connects to production ML systems:\n\n### System Design - How does this fit into larger systems?\n1. **Identity and Attribution**: Your `personal_info()` function establishes developer identity. How does proper attribution become crucial when ML teams collaborate on models that affect millions of users? What happens when models misbehave and you need to trace accountability?\n\n2. **Environment Reproducibility**: Your `system_info()` captures hardware specs automatically. When researchers publish papers claiming breakthrough results, why is documenting the exact environment (CPU, memory, Python version) essential for reproducibility? How does this connect to the \"replication crisis\" in AI research?\n\n3. **Hardware-Aware Development**: Your function detects CPU count and memory. How do modern ML frameworks like PyTorch use this information to automatically parallelize computations? Why might the same model code behave differently on a laptop vs. a cloud instance?\n\n### Production ML - How is this used in real ML workflows?\n4. **Configuration Management**: Your personal configuration mirrors how production systems identify model creators. How do companies like Netflix or Spotify track which data scientist trained which recommendation model when debugging performance issues?\n\n5. **Resource Planning**: Your memory detection helps understand system limits. When deploying large language models in production, how do resource constraints influence architectural decisions? Why might a 16GB system require different serving strategies than a 128GB system?\n\n6. **Version Control Integration**: Your system fingerprinting resembles Git's commit metadata. How does proper environment documentation help when a model trained 6 months ago suddenly needs retraining with updated data?\n\n### Framework Design - Why do frameworks make certain choices?\n7. **Abstraction Layers**: Your simple functions hide OS complexity. How do frameworks like PyTorch abstract hardware differences so the same neural network code runs on CPUs, GPUs, and TPUs without modification?\n\n8. **Metadata Standards**: Your configuration dictionary structure mirrors industry practices. Why do frameworks invest heavily in standardized metadata formats for model cards, experiment tracking, and deployment manifests?\n\n9. **Development Ergonomics**: Your functions provide clean APIs for system queries. How does good developer experience in configuration tools ripple through to faster experimentation and more reliable model development?\n\n### Performance & Scale - What happens when systems get large?\n10. **Distributed Configuration**: Your single-machine setup works locally. How does configuration management change when training large models across hundreds of machines in data centers? What new challenges emerge?\n\n11. **Resource Optimization**: Your memory detection helps with local planning. How do cloud providers like AWS optimize resource allocation when thousands of researchers are training models simultaneously? What role does configuration metadata play?\n\n12. **System Monitoring**: Your hardware queries provide snapshots. How do production ML systems use continuous monitoring of CPU, memory, and GPU utilization to automatically scale training jobs and serving infrastructure?\n\n**💡 Systems Insight**: The simple configuration functions you built are the DNA of ML systems—every production deployment, research experiment, and model serving instance needs to know \"who built this, where is it running, and what resources are available.\" This metadata becomes critical when things go wrong or when scaling to millions of users.",
|
|
"metadata": {}
|
|
}
|
|
],
|
|
"metadata": {
|
|
"jupytext": {
|
|
"main_language": "python"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.13.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
} |