mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-12 10:34:34 -05:00
🔧 TESTING INFRASTRUCTURE FIXES: - Fixed pytest configuration (removed duplicate timeout) - Exported all modules to tinytorch package using nbdev - Converted .py files to .ipynb for proper NBDev processing - Fixed import issues in test files with fallback strategies 📊 TESTING RESULTS: - 145 tests passing, 15 failing, 16 skipped - Major improvement from previous import errors - All modules now properly exported and testable - Analysis tool working correctly on all modules 🎯 MODULE QUALITY STATUS: - Most modules: Grade C, Scaffolding 3/5 - 01_tensor: Grade C, Scaffolding 2/5 (needs improvement) - 07_autograd: Grade D, Scaffolding 2/5 (needs improvement) - Overall: Functional but needs educational enhancement ✅ RESOLVED ISSUES: - All import errors resolved - NBDev export process working - Test infrastructure functional - Analysis tools operational 🚀 READY FOR NEXT PHASE: Professional report cards and improvements
753 lines
30 KiB
Plaintext
753 lines
30 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5ac421cb",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"# Module 0: Setup - TinyTorch System Configuration\n",
|
|
"\n",
|
|
"Welcome to TinyTorch! This setup module configures your personal TinyTorch installation and teaches you the NBGrader workflow.\n",
|
|
"\n",
|
|
"## Learning Goals\n",
|
|
"- Configure your personal TinyTorch installation with custom information\n",
|
|
"- Learn to query system information using Python modules\n",
|
|
"- Master the NBGrader workflow: implement → test → export\n",
|
|
"- Create functions that become part of your tinytorch package\n",
|
|
"- Understand solution blocks, hidden tests, and automated grading\n",
|
|
"\n",
|
|
"## The Big Picture: Why Configuration Matters in ML Systems\n",
|
|
"Configuration is the foundation of any production ML system. In this module, you'll learn:\n",
|
|
"\n",
|
|
"### 1. **System Awareness**\n",
|
|
"Real ML systems need to understand their environment:\n",
|
|
"- **Hardware constraints**: Memory, CPU cores, GPU availability\n",
|
|
"- **Software dependencies**: Python version, library compatibility\n",
|
|
"- **Platform differences**: Linux servers, macOS development, Windows deployment\n",
|
|
"\n",
|
|
"### 2. **Reproducibility**\n",
|
|
"Configuration enables reproducible ML:\n",
|
|
"- **Environment documentation**: Exactly what system was used\n",
|
|
"- **Dependency management**: Precise versions and requirements\n",
|
|
"- **Debugging support**: System info helps troubleshoot issues\n",
|
|
"\n",
|
|
"### 3. **Professional Development**\n",
|
|
"Proper configuration shows engineering maturity:\n",
|
|
"- **Attribution**: Your work is properly credited\n",
|
|
"- **Collaboration**: Others can understand and extend your setup\n",
|
|
"- **Maintenance**: Systems can be updated and maintained\n",
|
|
"\n",
|
|
"### 4. **ML Systems Context**\n",
|
|
"This connects to broader ML engineering:\n",
|
|
"- **Model deployment**: Different environments need different configs\n",
|
|
"- **Monitoring**: System metrics help track performance\n",
|
|
"- **Scaling**: Understanding hardware helps optimize training\n",
|
|
"\n",
|
|
"Let's build the foundation of your ML systems engineering skills!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7f1744ef",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "setup-imports",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| default_exp core.setup\n",
|
|
"\n",
|
|
"#| export\n",
|
|
"import sys\n",
|
|
"import platform\n",
|
|
"import psutil\n",
|
|
"import os\n",
|
|
"from typing import Dict, Any"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "73a84b61",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "setup-imports",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(\"🔥 TinyTorch Setup Module\")\n",
|
|
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
|
|
"print(f\"Platform: {platform.system()}\")\n",
|
|
"print(\"Ready to configure your TinyTorch installation!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2a7a713c",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## 🏗️ The Architecture of ML Systems Configuration\n",
|
|
"\n",
|
|
"### Configuration Layers in Production ML\n",
|
|
"Real ML systems have multiple configuration layers:\n",
|
|
"\n",
|
|
"```\n",
|
|
"┌─────────────────────────────────────┐\n",
|
|
"│ Application Config │ ← Your personal info\n",
|
|
"├─────────────────────────────────────┤\n",
|
|
"│ System Environment │ ← Hardware specs\n",
|
|
"├─────────────────────────────────────┤\n",
|
|
"│ Runtime Configuration │ ← Python, libraries\n",
|
|
"├─────────────────────────────────────┤\n",
|
|
"│ Infrastructure Config │ ← Cloud, containers\n",
|
|
"└─────────────────────────────────────┘\n",
|
|
"```\n",
|
|
"\n",
|
|
"### Why Each Layer Matters\n",
|
|
"- **Application**: Identifies who built what and when\n",
|
|
"- **System**: Determines performance characteristics and limitations\n",
|
|
"- **Runtime**: Affects compatibility and feature availability\n",
|
|
"- **Infrastructure**: Enables scaling and deployment strategies\n",
|
|
"\n",
|
|
"### Connection to Real ML Frameworks\n",
|
|
"Every major ML framework has configuration:\n",
|
|
"- **PyTorch**: `torch.cuda.is_available()`, `torch.get_num_threads()`\n",
|
|
"- **TensorFlow**: `tf.config.list_physical_devices()`, `tf.sysconfig.get_build_info()`\n",
|
|
"- **Hugging Face**: Model cards with system requirements and performance metrics\n",
|
|
"- **MLflow**: Experiment tracking with system context and reproducibility\n",
|
|
"\n",
|
|
"### TinyTorch's Approach\n",
|
|
"We'll build configuration that's:\n",
|
|
"- **Educational**: Teaches system awareness\n",
|
|
"- **Practical**: Actually useful for debugging\n",
|
|
"- **Professional**: Follows industry standards\n",
|
|
"- **Extensible**: Ready for future ML systems features"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6a4d8aba",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## Step 1: What is System Configuration?\n",
|
|
"\n",
|
|
"### Definition\n",
|
|
"**System configuration** is the process of setting up your development environment with personalized information and system diagnostics. In TinyTorch, this means:\n",
|
|
"\n",
|
|
"- **Personal Information**: Your name, email, institution for identification\n",
|
|
"- **System Information**: Hardware specs, Python version, platform details\n",
|
|
"- **Customization**: Making your TinyTorch installation uniquely yours\n",
|
|
"\n",
|
|
"### Why Configuration Matters in ML Systems\n",
|
|
"Proper system configuration is crucial because:\n",
|
|
"\n",
|
|
"#### 1. **Reproducibility** \n",
|
|
"Your setup can be documented and shared:\n",
|
|
"```python\n",
|
|
"# Someone else can recreate your environment\n",
|
|
"config = {\n",
|
|
" 'developer': 'Your Name',\n",
|
|
" 'python_version': '3.9.7',\n",
|
|
" 'platform': 'Darwin',\n",
|
|
" 'memory_gb': 16.0\n",
|
|
"}\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### 2. **Debugging**\n",
|
|
"System info helps troubleshoot ML performance issues:\n",
|
|
"- **Memory errors**: \"Do I have enough RAM for this model?\"\n",
|
|
"- **Performance issues**: \"How many CPU cores can I use?\"\n",
|
|
"- **Compatibility problems**: \"What Python version am I running?\"\n",
|
|
"\n",
|
|
"#### 3. **Professional Development**\n",
|
|
"Shows proper engineering practices:\n",
|
|
"- **Attribution**: Your work is properly credited\n",
|
|
"- **Collaboration**: Others can contact you about your code\n",
|
|
"- **Documentation**: System context is preserved\n",
|
|
"\n",
|
|
"#### 4. **ML Systems Integration**\n",
|
|
"Connects to broader ML engineering:\n",
|
|
"- **Model cards**: Document system requirements\n",
|
|
"- **Experiment tracking**: Record hardware context\n",
|
|
"- **Deployment**: Match development to production environments\n",
|
|
"\n",
|
|
"### Real-World Examples\n",
|
|
"- **Google Colab**: Shows GPU type, RAM, disk space\n",
|
|
"- **Kaggle**: Displays system specs for reproducibility\n",
|
|
"- **MLflow**: Tracks system context with experiments\n",
|
|
"- **Docker**: Containerizes entire system configuration\n",
|
|
"\n",
|
|
"Let's start configuring your TinyTorch system!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7e12b1a4",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## Step 2: Personal Information Configuration\n",
|
|
"\n",
|
|
"### The Concept: Identity in ML Systems\n",
|
|
"Your **personal information** identifies you as the developer and configures your TinyTorch installation. This isn't just administrative - it's foundational to professional ML development.\n",
|
|
"\n",
|
|
"### Why Personal Info Matters in ML Engineering\n",
|
|
"\n",
|
|
"#### 1. **Attribution and Accountability**\n",
|
|
"- **Model ownership**: Who built this model?\n",
|
|
"- **Responsibility**: Who should be contacted about issues?\n",
|
|
"- **Credit**: Proper recognition for your work\n",
|
|
"\n",
|
|
"#### 2. **Collaboration and Communication**\n",
|
|
"- **Team coordination**: Multiple developers on ML projects\n",
|
|
"- **Knowledge sharing**: Others can learn from your work\n",
|
|
"- **Bug reports**: Contact info for issues and improvements\n",
|
|
"\n",
|
|
"#### 3. **Professional Standards**\n",
|
|
"- **Industry practice**: All professional software has attribution\n",
|
|
"- **Open source**: Proper credit in shared code\n",
|
|
"- **Academic integrity**: Clear authorship in research\n",
|
|
"\n",
|
|
"#### 4. **System Customization**\n",
|
|
"- **Personalized experience**: Your TinyTorch installation\n",
|
|
"- **Unique identification**: Distinguish your work from others\n",
|
|
"- **Development tracking**: Link code to developer\n",
|
|
"\n",
|
|
"### Real-World Parallels\n",
|
|
"- **Git commits**: Author name and email in every commit\n",
|
|
"- **Docker images**: Maintainer information in container metadata\n",
|
|
"- **Python packages**: Author info in `setup.py` and `pyproject.toml`\n",
|
|
"- **Model cards**: Creator information for ML models\n",
|
|
"\n",
|
|
"### Best Practices for Personal Configuration\n",
|
|
"- **Use real information**: Not placeholders or fake data\n",
|
|
"- **Professional email**: Accessible and appropriate\n",
|
|
"- **Descriptive system name**: Unique and meaningful\n",
|
|
"- **Consistent formatting**: Follow established conventions\n",
|
|
"\n",
|
|
"Now let's implement your personal configuration!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "28c6c733",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1,
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "personal-info",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"def personal_info() -> Dict[str, str]:\n",
|
|
" \"\"\"\n",
|
|
" Return personal information for this TinyTorch installation.\n",
|
|
" \n",
|
|
" This function configures your personal TinyTorch installation with your identity.\n",
|
|
" It's the foundation of proper ML engineering practices - every system needs\n",
|
|
" to know who built it and how to contact them.\n",
|
|
" \n",
|
|
" TODO: Implement personal information configuration.\n",
|
|
" \n",
|
|
" STEP-BY-STEP IMPLEMENTATION:\n",
|
|
" 1. Create a dictionary with your personal details\n",
|
|
" 2. Include all required keys: developer, email, institution, system_name, version\n",
|
|
" 3. Use your actual information (not placeholder text)\n",
|
|
" 4. Make system_name unique and descriptive\n",
|
|
" 5. Keep version as '1.0.0' for now\n",
|
|
" \n",
|
|
" EXAMPLE OUTPUT:\n",
|
|
" {\n",
|
|
" 'developer': 'Vijay Janapa Reddi',\n",
|
|
" 'email': 'vj@eecs.harvard.edu', \n",
|
|
" 'institution': 'Harvard University',\n",
|
|
" 'system_name': 'VJ-TinyTorch-Dev',\n",
|
|
" 'version': '1.0.0'\n",
|
|
" }\n",
|
|
" \n",
|
|
" IMPLEMENTATION HINTS:\n",
|
|
" - Replace the example with your real information\n",
|
|
" - Use a descriptive system_name (e.g., 'YourName-TinyTorch-Dev')\n",
|
|
" - Keep email format valid (contains @ and domain)\n",
|
|
" - Make sure all values are strings\n",
|
|
" - Consider how this info will be used in debugging and collaboration\n",
|
|
" \n",
|
|
" LEARNING CONNECTIONS:\n",
|
|
" - This is like the 'author' field in Git commits\n",
|
|
" - Similar to maintainer info in Docker images\n",
|
|
" - Parallels author info in Python packages\n",
|
|
" - Foundation for professional ML development\n",
|
|
" \"\"\"\n",
|
|
" ### BEGIN SOLUTION\n",
|
|
" return {\n",
|
|
" 'developer': 'Vijay Janapa Reddi',\n",
|
|
" 'email': 'vj@eecs.harvard.edu',\n",
|
|
" 'institution': 'Harvard University',\n",
|
|
" 'system_name': 'VJ-TinyTorch-Dev',\n",
|
|
" 'version': '1.0.0'\n",
|
|
" }\n",
|
|
" ### END SOLUTION"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7eab5a50",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## Step 3: System Information Queries\n",
|
|
"\n",
|
|
"### The Concept: Hardware-Aware ML Systems\n",
|
|
"**System information** provides details about your hardware and software environment. This is crucial for ML development because machine learning is fundamentally about computation, and computation depends on hardware.\n",
|
|
"\n",
|
|
"### Why System Information Matters in ML Engineering\n",
|
|
"\n",
|
|
"#### 1. **Performance Optimization**\n",
|
|
"- **CPU cores**: Determines parallelization strategies\n",
|
|
"- **Memory**: Limits batch size and model size\n",
|
|
"- **Architecture**: Affects numerical precision and optimization\n",
|
|
"\n",
|
|
"#### 2. **Compatibility and Debugging**\n",
|
|
"- **Python version**: Determines available features and libraries\n",
|
|
"- **Platform**: Affects file paths, process management, and system calls\n",
|
|
"- **Architecture**: Influences numerical behavior and optimization\n",
|
|
"\n",
|
|
"#### 3. **Resource Planning**\n",
|
|
"- **Training time estimation**: More cores = faster training\n",
|
|
"- **Memory requirements**: Avoid out-of-memory errors\n",
|
|
"- **Deployment matching**: Development should match production\n",
|
|
"\n",
|
|
"#### 4. **Reproducibility**\n",
|
|
"- **Environment documentation**: Exact system specifications\n",
|
|
"- **Performance comparison**: Same code, different hardware\n",
|
|
"- **Bug reproduction**: System-specific issues\n",
|
|
"\n",
|
|
"### The Python System Query Toolkit\n",
|
|
"You'll learn to use these essential Python modules:\n",
|
|
"\n",
|
|
"#### `sys.version_info` - Python Version\n",
|
|
"```python\n",
|
|
"version_info = sys.version_info\n",
|
|
"python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
|
|
"# Example: \"3.9.7\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### `platform.system()` - Operating System\n",
|
|
"```python\n",
|
|
"platform_name = platform.system()\n",
|
|
"# Examples: \"Darwin\" (macOS), \"Linux\", \"Windows\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### `platform.machine()` - CPU Architecture\n",
|
|
"```python\n",
|
|
"architecture = platform.machine()\n",
|
|
"# Examples: \"x86_64\", \"arm64\", \"aarch64\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### `psutil.cpu_count()` - CPU Cores\n",
|
|
"```python\n",
|
|
"cpu_count = psutil.cpu_count()\n",
|
|
"# Example: 8 (cores available for parallel processing)\n",
|
|
"```\n",
|
|
"\n",
|
|
"#### `psutil.virtual_memory().total` - Total RAM\n",
|
|
"```python\n",
|
|
"memory_bytes = psutil.virtual_memory().total\n",
|
|
"memory_gb = round(memory_bytes / (1024**3), 1)\n",
|
|
"# Example: 16.0 GB\n",
|
|
"```\n",
|
|
"\n",
|
|
"### Real-World Applications\n",
|
|
"- **PyTorch**: `torch.get_num_threads()` uses CPU count\n",
|
|
"- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware\n",
|
|
"- **Scikit-learn**: `n_jobs=-1` uses all available cores\n",
|
|
"- **Dask**: Automatically configures workers based on CPU count\n",
|
|
"\n",
|
|
"### ML Systems Performance Considerations\n",
|
|
"- **Memory-bound operations**: Matrix multiplication, large model loading\n",
|
|
"- **CPU-bound operations**: Data preprocessing, feature engineering\n",
|
|
"- **I/O-bound operations**: Data loading, model saving\n",
|
|
"- **Platform-specific optimizations**: SIMD instructions, memory management\n",
|
|
"\n",
|
|
"Now let's implement system information queries!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "fa8eb2a9",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1,
|
|
"nbgrader": {
|
|
"grade": false,
|
|
"grade_id": "system-info",
|
|
"locked": false,
|
|
"schema_version": 3,
|
|
"solution": true,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"def system_info() -> Dict[str, Any]:\n",
|
|
" \"\"\"\n",
|
|
" Query and return system information for this TinyTorch installation.\n",
|
|
" \n",
|
|
" This function gathers crucial hardware and software information that affects\n",
|
|
" ML performance, compatibility, and debugging. It's the foundation of \n",
|
|
" hardware-aware ML systems.\n",
|
|
" \n",
|
|
" TODO: Implement system information queries.\n",
|
|
" \n",
|
|
" STEP-BY-STEP IMPLEMENTATION:\n",
|
|
" 1. Get Python version using sys.version_info\n",
|
|
" 2. Get platform using platform.system()\n",
|
|
" 3. Get architecture using platform.machine()\n",
|
|
" 4. Get CPU count using psutil.cpu_count()\n",
|
|
" 5. Get memory using psutil.virtual_memory().total\n",
|
|
" 6. Convert memory from bytes to GB (divide by 1024^3)\n",
|
|
" 7. Return all information in a dictionary\n",
|
|
" \n",
|
|
" EXAMPLE OUTPUT:\n",
|
|
" {\n",
|
|
" 'python_version': '3.9.7',\n",
|
|
" 'platform': 'Darwin', \n",
|
|
" 'architecture': 'arm64',\n",
|
|
" 'cpu_count': 8,\n",
|
|
" 'memory_gb': 16.0\n",
|
|
" }\n",
|
|
" \n",
|
|
" IMPLEMENTATION HINTS:\n",
|
|
" - Use f-string formatting for Python version: f\"{major}.{minor}.{micro}\"\n",
|
|
" - Memory conversion: bytes / (1024^3) = GB\n",
|
|
" - Round memory to 1 decimal place for readability\n",
|
|
" - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)\n",
|
|
" \n",
|
|
" LEARNING CONNECTIONS:\n",
|
|
" - This is like `torch.cuda.is_available()` in PyTorch\n",
|
|
" - Similar to system info in MLflow experiment tracking\n",
|
|
" - Parallels hardware detection in TensorFlow\n",
|
|
" - Foundation for performance optimization in ML systems\n",
|
|
" \n",
|
|
" PERFORMANCE IMPLICATIONS:\n",
|
|
" - cpu_count affects parallel processing capabilities\n",
|
|
" - memory_gb determines maximum model and batch sizes\n",
|
|
" - platform affects file system and process management\n",
|
|
" - architecture influences numerical precision and optimization\n",
|
|
" \"\"\"\n",
|
|
" ### BEGIN SOLUTION\n",
|
|
" # Get Python version\n",
|
|
" version_info = sys.version_info\n",
|
|
" python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
|
|
" \n",
|
|
" # Get platform information\n",
|
|
" platform_name = platform.system()\n",
|
|
" architecture = platform.machine()\n",
|
|
" \n",
|
|
" # Get CPU information\n",
|
|
" cpu_count = psutil.cpu_count()\n",
|
|
" \n",
|
|
" # Get memory information (convert bytes to GB)\n",
|
|
" memory_bytes = psutil.virtual_memory().total\n",
|
|
" memory_gb = round(memory_bytes / (1024**3), 1)\n",
|
|
" \n",
|
|
" return {\n",
|
|
" 'python_version': python_version,\n",
|
|
" 'platform': platform_name,\n",
|
|
" 'architecture': architecture,\n",
|
|
" 'cpu_count': cpu_count,\n",
|
|
" 'memory_gb': memory_gb\n",
|
|
" }\n",
|
|
" ### END SOLUTION"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "42812a3e",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## 🧪 Testing Your Configuration Functions\n",
|
|
"\n",
|
|
"### The Importance of Testing in ML Systems\n",
|
|
"Before we test your implementation, let's understand why testing is crucial in ML systems:\n",
|
|
"\n",
|
|
"#### 1. **Reliability**\n",
|
|
"- **Function correctness**: Does your code do what it's supposed to?\n",
|
|
"- **Edge case handling**: What happens with unexpected inputs?\n",
|
|
"- **Error detection**: Catch bugs before they cause problems\n",
|
|
"\n",
|
|
"#### 2. **Reproducibility**\n",
|
|
"- **Consistent behavior**: Same inputs always produce same outputs\n",
|
|
"- **Environment validation**: Ensure setup works across different systems\n",
|
|
"- **Regression prevention**: New changes don't break existing functionality\n",
|
|
"\n",
|
|
"#### 3. **Professional Development**\n",
|
|
"- **Code quality**: Well-tested code is maintainable code\n",
|
|
"- **Collaboration**: Others can trust and extend your work\n",
|
|
"- **Documentation**: Tests serve as executable documentation\n",
|
|
"\n",
|
|
"#### 4. **ML-Specific Concerns**\n",
|
|
"- **Data validation**: Ensure data types and shapes are correct\n",
|
|
"- **Performance verification**: Check that optimizations work\n",
|
|
"- **System compatibility**: Verify cross-platform behavior\n",
|
|
"\n",
|
|
"### Testing Strategy\n",
|
|
"We'll use comprehensive testing that checks:\n",
|
|
"- **Return types**: Are outputs the correct data types?\n",
|
|
"- **Required fields**: Are all expected keys present?\n",
|
|
"- **Data validation**: Are values reasonable and properly formatted?\n",
|
|
"- **System accuracy**: Do queries match actual system state?\n",
|
|
"\n",
|
|
"Now let's test your configuration functions!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "42114d4e",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"### 🧪 Test Your Configuration Functions\n",
|
|
"\n",
|
|
"Once you implement both functions above, run this cell to test them:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d006704e",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": true,
|
|
"grade_id": "test-personal-info",
|
|
"locked": true,
|
|
"points": 25,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test personal information configuration\n",
|
|
"print(\"Testing personal information...\")\n",
|
|
"\n",
|
|
"# Test personal_info function\n",
|
|
"personal = personal_info()\n",
|
|
"\n",
|
|
"# Test return type\n",
|
|
"assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
|
|
"\n",
|
|
"# Test required keys\n",
|
|
"required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
|
|
"for key in required_keys:\n",
|
|
" assert key in personal, f\"Dictionary should have '{key}' key\"\n",
|
|
"\n",
|
|
"# Test non-empty values\n",
|
|
"for key, value in personal.items():\n",
|
|
" assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
|
|
" assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
|
|
"\n",
|
|
"# Test email format\n",
|
|
"assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
|
|
"assert '.' in personal['email'], \"Email should contain domain\"\n",
|
|
"\n",
|
|
"# Test version format\n",
|
|
"assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
|
|
"\n",
|
|
"# Test system name (should be unique/personalized)\n",
|
|
"assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
|
|
"\n",
|
|
"print(\"✅ Personal info function tests passed!\")\n",
|
|
"print(f\"✅ TinyTorch configured for: {personal['developer']}\")\n",
|
|
"print(f\"✅ System: {personal['system_name']}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "50045379",
|
|
"metadata": {
|
|
"nbgrader": {
|
|
"grade": true,
|
|
"grade_id": "test-system-info",
|
|
"locked": true,
|
|
"points": 25,
|
|
"schema_version": 3,
|
|
"solution": false,
|
|
"task": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test system information queries\n",
|
|
"print(\"Testing system information...\")\n",
|
|
"\n",
|
|
"# Test system_info function\n",
|
|
"sys_info = system_info()\n",
|
|
"\n",
|
|
"# Test return type\n",
|
|
"assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
|
|
"\n",
|
|
"# Test required keys\n",
|
|
"required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
|
|
"for key in required_keys:\n",
|
|
" assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
|
|
"\n",
|
|
"# Test data types\n",
|
|
"assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
|
|
"assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
|
|
"assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
|
|
"assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
|
|
"assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
|
|
"\n",
|
|
"# Test reasonable values\n",
|
|
"assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
|
|
"assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
|
|
"assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
|
|
"\n",
|
|
"# Test that values are actually queried (not hardcoded)\n",
|
|
"actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
|
|
"assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
|
|
"\n",
|
|
"print(\"✅ System info function tests passed!\")\n",
|
|
"print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")\n",
|
|
"print(f\"✅ Hardware: {sys_info['cpu_count']} cores, {sys_info['memory_gb']} GB RAM\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "73826cf3",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## 🎯 Module Summary: Foundation of ML Systems Engineering\n",
|
|
"\n",
|
|
"Congratulations! You've successfully configured your TinyTorch installation and learned the foundations of ML systems engineering:\n",
|
|
"\n",
|
|
"### What You've Accomplished\n",
|
|
"✅ **Personal Configuration**: Set up your identity and custom system name \n",
|
|
"✅ **System Queries**: Learned to gather hardware and software information \n",
|
|
"✅ **NBGrader Workflow**: Mastered solution blocks and automated testing \n",
|
|
"✅ **Code Export**: Created functions that become part of your tinytorch package \n",
|
|
"✅ **Professional Setup**: Established proper development practices \n",
|
|
"\n",
|
|
"### Key Concepts You've Learned\n",
|
|
"\n",
|
|
"#### 1. **System Awareness**\n",
|
|
"- **Hardware constraints**: Understanding CPU, memory, and architecture limitations\n",
|
|
"- **Software dependencies**: Python version and platform compatibility\n",
|
|
"- **Performance implications**: How system specs affect ML workloads\n",
|
|
"\n",
|
|
"#### 2. **Configuration Management**\n",
|
|
"- **Personal identification**: Professional attribution and contact information\n",
|
|
"- **Environment documentation**: Reproducible system specifications\n",
|
|
"- **Professional standards**: Industry-standard development practices\n",
|
|
"\n",
|
|
"#### 3. **ML Systems Foundations**\n",
|
|
"- **Reproducibility**: System context for experiment tracking\n",
|
|
"- **Debugging**: Hardware info for performance troubleshooting\n",
|
|
"- **Collaboration**: Proper attribution and contact information\n",
|
|
"\n",
|
|
"#### 4. **Development Workflow**\n",
|
|
"- **NBGrader integration**: Automated testing and grading\n",
|
|
"- **Code export**: Functions become part of production package\n",
|
|
"- **Testing practices**: Comprehensive validation of functionality\n",
|
|
"\n",
|
|
"### Connections to Real ML Systems\n",
|
|
"\n",
|
|
"This module connects to broader ML engineering practices:\n",
|
|
"\n",
|
|
"#### **Industry Parallels**\n",
|
|
"- **Docker containers**: System configuration and reproducibility\n",
|
|
"- **MLflow tracking**: Experiment context and system metadata\n",
|
|
"- **Model cards**: Documentation of system requirements and performance\n",
|
|
"- **CI/CD pipelines**: Automated testing and environment validation\n",
|
|
"\n",
|
|
"#### **Production Considerations**\n",
|
|
"- **Deployment matching**: Development environment should match production\n",
|
|
"- **Resource planning**: Understanding hardware constraints for scaling\n",
|
|
"- **Monitoring**: System metrics for performance optimization\n",
|
|
"- **Debugging**: System context for troubleshooting issues\n",
|
|
"\n",
|
|
"### Next Steps in Your ML Systems Journey\n",
|
|
"\n",
|
|
"#### **Immediate Actions**\n",
|
|
"1. **Export your code**: `tito module export 00_setup`\n",
|
|
"2. **Test your installation**: \n",
|
|
" ```python\n",
|
|
" from tinytorch.core.setup import personal_info, system_info\n",
|
|
" print(personal_info()) # Your personal details\n",
|
|
" print(system_info()) # System information\n",
|
|
" ```\n",
|
|
"3. **Verify package integration**: Ensure your functions work in the tinytorch package\n",
|
|
"\n",
|
|
"#### **Looking Ahead**\n",
|
|
"- **Module 1 (Tensor)**: Build the fundamental data structure for ML\n",
|
|
"- **Module 2 (Activations)**: Add nonlinearity for complex learning\n",
|
|
"- **Module 3 (Layers)**: Create the building blocks of neural networks\n",
|
|
"- **Module 4 (Networks)**: Compose layers into powerful architectures\n",
|
|
"\n",
|
|
"#### **Course Progression**\n",
|
|
"You're now ready to build a complete ML system from scratch:\n",
|
|
"```\n",
|
|
"Setup → Tensor → Activations → Layers → Networks → CNN → DataLoader → \n",
|
|
"Autograd → Optimizers → Training → Compression → Kernels → Benchmarking → MLOps\n",
|
|
"```\n",
|
|
"\n",
|
|
"### Professional Development Milestone\n",
|
|
"\n",
|
|
"You've taken your first step in ML systems engineering! This module taught you:\n",
|
|
"- **System thinking**: Understanding hardware and software constraints\n",
|
|
"- **Professional practices**: Proper attribution, testing, and documentation\n",
|
|
"- **Tool mastery**: NBGrader workflow and package development\n",
|
|
"- **Foundation building**: Creating reusable, tested, documented code\n",
|
|
"\n",
|
|
"**Ready for the next challenge?** Let's build the foundation of ML systems with tensors!"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"jupytext": {
|
|
"main_language": "python"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|