Files
TinyTorch/modules/source/01_setup/setup_dev.ipynb
2025-07-15 23:51:56 -04:00

1700 lines
70 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "a84f5309",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Setup - TinyTorch System Configuration\n",
"\n",
"Welcome to TinyTorch! This setup module configures your personal TinyTorch installation and teaches you the NBGrader workflow.\n",
"\n",
"## Learning Goals\n",
"- Configure your personal TinyTorch installation with custom information\n",
"- Learn to query system information using Python modules\n",
"- Master the NBGrader workflow: implement → test → export\n",
"- Create functions that become part of your tinytorch package\n",
"- Understand solution blocks, hidden tests, and automated grading\n",
"\n",
"## The Big Picture: Why Configuration Matters in ML Systems\n",
"Configuration is the foundation of any production ML system. In this module, you'll learn:\n",
"\n",
"### 1. **System Awareness**\n",
"Real ML systems need to understand their environment:\n",
"- **Hardware constraints**: Memory, CPU cores, GPU availability\n",
"- **Software dependencies**: Python version, library compatibility\n",
"- **Platform differences**: Linux servers, macOS development, Windows deployment\n",
"\n",
"### 2. **Reproducibility**\n",
"Configuration enables reproducible ML:\n",
"- **Environment documentation**: Exactly what system was used\n",
"- **Dependency management**: Precise versions and requirements\n",
"- **Debugging support**: System info helps troubleshoot issues\n",
"\n",
"### 3. **Professional Development**\n",
"Proper configuration shows engineering maturity:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can understand and extend your setup\n",
"- **Maintenance**: Systems can be updated and maintained\n",
"\n",
"### 4. **ML Systems Context**\n",
"This connects to broader ML engineering:\n",
"- **Model deployment**: Different environments need different configs\n",
"- **Monitoring**: System metrics help track performance\n",
"- **Scaling**: Understanding hardware helps optimize training\n",
"\n",
"Let's build the foundation of your ML systems engineering skills!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b608e2e6",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "setup-imports",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"#| default_exp core.setup\n",
"\n",
"#| export\n",
"import sys\n",
"import platform\n",
"import psutil\n",
"import os\n",
"from typing import Dict, Any"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "427aefa2",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "setup-imports",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"print(\"🔥 TinyTorch Setup Module\")\n",
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
"print(f\"Platform: {platform.system()}\")\n",
"print(\"Ready to configure your TinyTorch installation!\")"
]
},
{
"cell_type": "markdown",
"id": "946074ef",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🏗️ The Architecture of ML Systems Configuration\n",
"\n",
"### Configuration Layers in Production ML\n",
"Real ML systems have multiple configuration layers:\n",
"\n",
"```\n",
"┌─────────────────────────────────────┐\n",
"│ Application Config │ ← Your personal info\n",
"├─────────────────────────────────────┤\n",
"│ System Environment │ ← Hardware specs\n",
"├─────────────────────────────────────┤\n",
"│ Runtime Configuration │ ← Python, libraries\n",
"├─────────────────────────────────────┤\n",
"│ Infrastructure Config │ ← Cloud, containers\n",
"└─────────────────────────────────────┘\n",
"```\n",
"\n",
"### Why Each Layer Matters\n",
"- **Application**: Identifies who built what and when\n",
"- **System**: Determines performance characteristics and limitations\n",
"- **Runtime**: Affects compatibility and feature availability\n",
"- **Infrastructure**: Enables scaling and deployment strategies\n",
"\n",
"### Connection to Real ML Frameworks\n",
"Every major ML framework has configuration:\n",
"- **PyTorch**: `torch.cuda.is_available()`, `torch.get_num_threads()`\n",
"- **TensorFlow**: `tf.config.list_physical_devices()`, `tf.sysconfig.get_build_info()`\n",
"- **Hugging Face**: Model cards with system requirements and performance metrics\n",
"- **MLflow**: Experiment tracking with system context and reproducibility\n",
"\n",
"### TinyTorch's Approach\n",
"We'll build configuration that's:\n",
"- **Educational**: Teaches system awareness\n",
"- **Practical**: Actually useful for debugging\n",
"- **Professional**: Follows industry standards\n",
"- **Extensible**: Ready for future ML systems features"
]
},
{
"cell_type": "markdown",
"id": "b2bb27d7",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 1: What is System Configuration?\n",
"\n",
"### Definition\n",
"**System configuration** is the process of setting up your development environment with personalized information and system diagnostics. In TinyTorch, this means:\n",
"\n",
"- **Personal Information**: Your name, email, institution for identification\n",
"- **System Information**: Hardware specs, Python version, platform details\n",
"- **Customization**: Making your TinyTorch installation uniquely yours\n",
"\n",
"### Why Configuration Matters in ML Systems\n",
"Proper system configuration is crucial because:\n",
"\n",
"#### 1. **Reproducibility** \n",
"Your setup can be documented and shared:\n",
"```python\n",
"# Someone else can recreate your environment\n",
"config = {\n",
" 'developer': 'Your Name',\n",
" 'python_version': '3.9.7',\n",
" 'platform': 'Darwin',\n",
" 'memory_gb': 16.0\n",
"}\n",
"```\n",
"\n",
"#### 2. **Debugging**\n",
"System info helps troubleshoot ML performance issues:\n",
"- **Memory errors**: \"Do I have enough RAM for this model?\"\n",
"- **Performance issues**: \"How many CPU cores can I use?\"\n",
"- **Compatibility problems**: \"What Python version am I running?\"\n",
"\n",
"#### 3. **Professional Development**\n",
"Shows proper engineering practices:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can contact you about your code\n",
"- **Documentation**: System context is preserved\n",
"\n",
"#### 4. **ML Systems Integration**\n",
"Connects to broader ML engineering:\n",
"- **Model cards**: Document system requirements\n",
"- **Experiment tracking**: Record hardware context\n",
"- **Deployment**: Match development to production environments\n",
"\n",
"### Real-World Examples\n",
"- **Google Colab**: Shows GPU type, RAM, disk space\n",
"- **Kaggle**: Displays system specs for reproducibility\n",
"- **MLflow**: Tracks system context with experiments\n",
"- **Docker**: Containerizes entire system configuration\n",
"\n",
"Let's start configuring your TinyTorch system!"
]
},
{
"cell_type": "markdown",
"id": "26b13500",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"## Step 2: Personal Information Configuration\n",
"\n",
"### The Concept: Identity in ML Systems\n",
"Your **personal information** identifies you as the developer and configures your TinyTorch installation. This isn't just administrative - it's foundational to professional ML development.\n",
"\n",
"### Why Personal Info Matters in ML Engineering\n",
"\n",
"#### 1. **Attribution and Accountability**\n",
"- **Model ownership**: Who built this model?\n",
"- **Responsibility**: Who should be contacted about issues?\n",
"- **Credit**: Proper recognition for your work\n",
"\n",
"#### 2. **Collaboration and Communication**\n",
"- **Team coordination**: Multiple developers on ML projects\n",
"- **Knowledge sharing**: Others can learn from your work\n",
"- **Bug reports**: Contact info for issues and improvements\n",
"\n",
"#### 3. **Professional Standards**\n",
"- **Industry practice**: All professional software has attribution\n",
"- **Open source**: Proper credit in shared code\n",
"- **Academic integrity**: Clear authorship in research\n",
"\n",
"#### 4. **System Customization**\n",
"- **Personalized experience**: Your TinyTorch installation\n",
"- **Unique identification**: Distinguish your work from others\n",
"- **Development tracking**: Link code to developer\n",
"\n",
"### Real-World Parallels\n",
"- **Git commits**: Author name and email in every commit\n",
"- **Docker images**: Maintainer information in container metadata\n",
"- **Python packages**: Author info in `setup.py` and `pyproject.toml`\n",
"- **Model cards**: Creator information for ML models\n",
"\n",
"### Best Practices for Personal Configuration\n",
"- **Use real information**: Not placeholders or fake data\n",
"- **Professional email**: Accessible and appropriate\n",
"- **Descriptive system name**: Unique and meaningful\n",
"- **Consistent formatting**: Follow established conventions\n",
"\n",
"Now let's implement your personal configuration!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae4d2930",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "personal-info",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def personal_info() -> Dict[str, str]:\n",
" \"\"\"\n",
" Return personal information for this TinyTorch installation.\n",
" \n",
" This function configures your personal TinyTorch installation with your identity.\n",
" It's the foundation of proper ML engineering practices - every system needs\n",
" to know who built it and how to contact them.\n",
" \n",
" TODO: Implement personal information configuration.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Create a dictionary with your personal details\n",
" 2. Include all required keys: developer, email, institution, system_name, version\n",
" 3. Use your actual information (not placeholder text)\n",
" 4. Make system_name unique and descriptive\n",
" 5. Keep version as '1.0.0' for now\n",
" \n",
" EXAMPLE OUTPUT:\n",
" {\n",
" 'developer': 'Vijay Janapa Reddi',\n",
" 'email': 'vj@eecs.harvard.edu', \n",
" 'institution': 'Harvard University',\n",
" 'system_name': 'VJ-TinyTorch-Dev',\n",
" 'version': '1.0.0'\n",
" }\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Replace the example with your real information\n",
" - Use a descriptive system_name (e.g., 'YourName-TinyTorch-Dev')\n",
" - Keep email format valid (contains @ and domain)\n",
" - Make sure all values are strings\n",
" - Consider how this info will be used in debugging and collaboration\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is like the 'author' field in Git commits\n",
" - Similar to maintainer info in Docker images\n",
" - Parallels author info in Python packages\n",
" - Foundation for professional ML development\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" return {\n",
" 'developer': 'Vijay Janapa Reddi',\n",
" 'email': 'vj@eecs.harvard.edu',\n",
" 'institution': 'Harvard University',\n",
" 'system_name': 'VJ-TinyTorch-Dev',\n",
" 'version': '1.0.0'\n",
" }\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "3e8b5d05",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"## Step 3: System Information Queries\n",
"\n",
"### The Concept: Hardware-Aware ML Systems\n",
"**System information** provides details about your hardware and software environment. This is crucial for ML development because machine learning is fundamentally about computation, and computation depends on hardware.\n",
"\n",
"### Why System Information Matters in ML Engineering\n",
"\n",
"#### 1. **Performance Optimization**\n",
"- **CPU cores**: Determines parallelization strategies\n",
"- **Memory**: Limits batch size and model size\n",
"- **Architecture**: Affects numerical precision and optimization\n",
"\n",
"#### 2. **Compatibility and Debugging**\n",
"- **Python version**: Determines available features and libraries\n",
"- **Platform**: Affects file paths, process management, and system calls\n",
"- **Architecture**: Influences numerical behavior and optimization\n",
"\n",
"#### 3. **Resource Planning**\n",
"- **Training time estimation**: More cores = faster training\n",
"- **Memory requirements**: Avoid out-of-memory errors\n",
"- **Deployment matching**: Development should match production\n",
"\n",
"#### 4. **Reproducibility**\n",
"- **Environment documentation**: Exact system specifications\n",
"- **Performance comparison**: Same code, different hardware\n",
"- **Bug reproduction**: System-specific issues\n",
"\n",
"### The Python System Query Toolkit\n",
"You'll learn to use these essential Python modules:\n",
"\n",
"#### `sys.version_info` - Python Version\n",
"```python\n",
"version_info = sys.version_info\n",
"python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
"# Example: \"3.9.7\"\n",
"```\n",
"\n",
"#### `platform.system()` - Operating System\n",
"```python\n",
"platform_name = platform.system()\n",
"# Examples: \"Darwin\" (macOS), \"Linux\", \"Windows\"\n",
"```\n",
"\n",
"#### `platform.machine()` - CPU Architecture\n",
"```python\n",
"architecture = platform.machine()\n",
"# Examples: \"x86_64\", \"arm64\", \"aarch64\"\n",
"```\n",
"\n",
"#### `psutil.cpu_count()` - CPU Cores\n",
"```python\n",
"cpu_count = psutil.cpu_count()\n",
"# Example: 8 (cores available for parallel processing)\n",
"```\n",
"\n",
"#### `psutil.virtual_memory().total` - Total RAM\n",
"```python\n",
"memory_bytes = psutil.virtual_memory().total\n",
"memory_gb = round(memory_bytes / (1024**3), 1)\n",
"# Example: 16.0 GB\n",
"```\n",
"\n",
"### Real-World Applications\n",
"- **PyTorch**: `torch.get_num_threads()` uses CPU count\n",
"- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware\n",
"- **Scikit-learn**: `n_jobs=-1` uses all available cores\n",
"- **Dask**: Automatically configures workers based on CPU count\n",
"\n",
"### ML Systems Performance Considerations\n",
"- **Memory-bound operations**: Matrix multiplication, large model loading\n",
"- **CPU-bound operations**: Data preprocessing, feature engineering\n",
"- **I/O-bound operations**: Data loading, model saving\n",
"- **Platform-specific optimizations**: SIMD instructions, memory management\n",
"\n",
"Now let's implement system information queries!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1607388",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "system-info",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def system_info() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Query and return system information for this TinyTorch installation.\n",
" \n",
" This function gathers crucial hardware and software information that affects\n",
" ML performance, compatibility, and debugging. It's the foundation of \n",
" hardware-aware ML systems.\n",
" \n",
" TODO: Implement system information queries.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Get Python version using sys.version_info\n",
" 2. Get platform using platform.system()\n",
" 3. Get architecture using platform.machine()\n",
" 4. Get CPU count using psutil.cpu_count()\n",
" 5. Get memory using psutil.virtual_memory().total\n",
" 6. Convert memory from bytes to GB (divide by 1024^3)\n",
" 7. Return all information in a dictionary\n",
" \n",
" EXAMPLE OUTPUT:\n",
" {\n",
" 'python_version': '3.9.7',\n",
" 'platform': 'Darwin', \n",
" 'architecture': 'arm64',\n",
" 'cpu_count': 8,\n",
" 'memory_gb': 16.0\n",
" }\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Use f-string formatting for Python version: f\"{major}.{minor}.{micro}\"\n",
" - Memory conversion: bytes / (1024^3) = GB\n",
" - Round memory to 1 decimal place for readability\n",
" - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is like `torch.cuda.is_available()` in PyTorch\n",
" - Similar to system info in MLflow experiment tracking\n",
" - Parallels hardware detection in TensorFlow\n",
" - Foundation for performance optimization in ML systems\n",
" \n",
" PERFORMANCE IMPLICATIONS:\n",
" - cpu_count affects parallel processing capabilities\n",
" - memory_gb determines maximum model and batch sizes\n",
" - platform affects file system and process management\n",
" - architecture influences numerical precision and optimization\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" # Get Python version\n",
" version_info = sys.version_info\n",
" python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
" \n",
" # Get platform information\n",
" platform_name = platform.system()\n",
" architecture = platform.machine()\n",
" \n",
" # Get CPU information\n",
" cpu_count = psutil.cpu_count()\n",
" \n",
" # Get memory information (convert bytes to GB)\n",
" memory_bytes = psutil.virtual_memory().total\n",
" memory_gb = round(memory_bytes / (1024**3), 1)\n",
" \n",
" return {\n",
" 'python_version': python_version,\n",
" 'platform': platform_name,\n",
" 'architecture': architecture,\n",
" 'cpu_count': cpu_count,\n",
" 'memory_gb': memory_gb\n",
" }\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "3671c633",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🧪 Testing Your Configuration Functions\n",
"\n",
"### The Importance of Testing in ML Systems\n",
"Before we test your implementation, let's understand why testing is crucial in ML systems:\n",
"\n",
"#### 1. **Reliability**\n",
"- **Function correctness**: Does your code do what it's supposed to?\n",
"- **Edge case handling**: What happens with unexpected inputs?\n",
"- **Error detection**: Catch bugs before they cause problems\n",
"\n",
"#### 2. **Reproducibility**\n",
"- **Consistent behavior**: Same inputs always produce same outputs\n",
"- **Environment validation**: Ensure setup works across different systems\n",
"- **Regression prevention**: New changes don't break existing functionality\n",
"\n",
"#### 3. **Professional Development**\n",
"- **Code quality**: Well-tested code is maintainable code\n",
"- **Collaboration**: Others can trust and extend your work\n",
"- **Documentation**: Tests serve as executable documentation\n",
"\n",
"#### 4. **ML-Specific Concerns**\n",
"- **Data validation**: Ensure data types and shapes are correct\n",
"- **Performance verification**: Check that optimizations work\n",
"- **System compatibility**: Verify cross-platform behavior\n",
"\n",
"### Testing Strategy\n",
"We'll use comprehensive testing that checks:\n",
"- **Return types**: Are outputs the correct data types?\n",
"- **Required fields**: Are all expected keys present?\n",
"- **Data validation**: Are values reasonable and properly formatted?\n",
"- **System accuracy**: Do queries match actual system state?\n",
"\n",
"Now let's test your configuration functions!"
]
},
{
"cell_type": "markdown",
"id": "fa14788c",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"### 🧪 Test Your Configuration Functions\n",
"\n",
"Once you implement both functions above, run this cell to test them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c0c8c52",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-personal-info",
"locked": true,
"points": 25,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# Test personal information configuration\n",
"print(\"🔬 Unit Test: Personal Information...\")\n",
"\n",
"# Test personal_info function\n",
"personal = personal_info()\n",
"\n",
"# Test return type\n",
"assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
"\n",
"# Test required keys\n",
"required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
"for key in required_keys:\n",
" assert key in personal, f\"Dictionary should have '{key}' key\"\n",
"\n",
"# Test non-empty values\n",
"for key, value in personal.items():\n",
" assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
" assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
"\n",
"# Test email format\n",
"assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
"assert '.' in personal['email'], \"Email should contain domain\"\n",
"\n",
"# Test version format\n",
"assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
"\n",
"# Test system name (should be unique/personalized)\n",
"assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
"\n",
"print(\"✅ Personal info function tests passed!\")\n",
"print(f\"✅ TinyTorch configured for: {personal['developer']}\")\n",
"print(f\"✅ System: {personal['system_name']}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b30693d",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-system-info",
"locked": true,
"points": 25,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# Test system information queries\n",
"print(\"🔬 Unit Test: System Information...\")\n",
"\n",
"# Test system_info function\n",
"sys_info = system_info()\n",
"\n",
"# Test return type\n",
"assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
"\n",
"# Test required keys\n",
"required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
"for key in required_keys:\n",
" assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
"\n",
"# Test data types\n",
"assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
"assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
"assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
"assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
"assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
"\n",
"# Test reasonable values\n",
"assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
"assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
"assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
"\n",
"# Test that values are actually queried (not hardcoded)\n",
"actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
"assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
"\n",
"print(\"✅ System info function tests passed!\")\n",
"print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")\n",
"print(f\"✅ Memory: {sys_info['memory_gb']} GB, CPUs: {sys_info['cpu_count']}\")"
]
},
{
"cell_type": "markdown",
"id": "c44390b2",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### 🧪 Inline Test Functions\n",
"\n",
"These test functions provide immediate feedback when developing your solutions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "404c5605",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def test_personal_info():\n",
" \"\"\"Test personal_info function implementation.\"\"\"\n",
" print(\"🔬 Unit Test: Personal Information...\")\n",
" \n",
" # Test personal_info function\n",
" personal = personal_info()\n",
" \n",
" # Test return type\n",
" assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
" for key in required_keys:\n",
" assert key in personal, f\"Dictionary should have '{key}' key\"\n",
" \n",
" # Test non-empty values\n",
" for key, value in personal.items():\n",
" assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
" assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
" \n",
" # Test email format\n",
" assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
" assert '.' in personal['email'], \"Email should contain domain\"\n",
" \n",
" # Test version format\n",
" assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
" \n",
" # Test system name (should be unique/personalized)\n",
" assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
" \n",
" print(\"✅ Personal info function tests passed!\")\n",
" print(f\"✅ TinyTorch configured for: {personal['developer']}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ab7c64b",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def test_system_info():\n",
" \"\"\"Test system_info function implementation.\"\"\"\n",
" print(\"🔬 Unit Test: System Information...\")\n",
" \n",
" # Test system_info function\n",
" sys_info = system_info()\n",
" \n",
" # Test return type\n",
" assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
" for key in required_keys:\n",
" assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
" \n",
" # Test data types\n",
" assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
" assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
" assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
" assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
" assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
" \n",
" # Test reasonable values\n",
" assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
" assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
" assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
" \n",
" # Test that values are actually queried (not hardcoded)\n",
" actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
" assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
" \n",
" print(\"✅ System info function tests passed!\")\n",
" print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")"
]
},
{
"cell_type": "markdown",
"id": "54d58db1",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🎯 Professional ML Engineering Skills\n",
"\n",
"You've successfully configured your TinyTorch installation and learned the foundations of ML systems engineering:\n",
"\n",
"### What You've Accomplished\n",
"✅ **Personal Configuration**: Set up your identity and custom system name \n",
"✅ **System Queries**: Learned to gather hardware and software information \n",
"✅ **NBGrader Workflow**: Mastered solution blocks and automated testing \n",
"✅ **Code Export**: Created functions that become part of your tinytorch package \n",
"✅ **Professional Setup**: Established proper development practices \n",
"\n",
"### Key Concepts You've Learned\n",
"\n",
"#### 1. **System Awareness**\n",
"- **Hardware constraints**: Understanding CPU, memory, and architecture limitations\n",
"- **Software dependencies**: Python version and platform compatibility\n",
"- **Performance implications**: How system specs affect ML workloads\n",
"\n",
"#### 2. **Configuration Management**\n",
"- **Personal identification**: Professional attribution and contact information\n",
"- **Environment documentation**: Reproducible system specifications\n",
"- **Professional standards**: Industry-standard development practices\n",
"\n",
"#### 3. **ML Systems Foundations**\n",
"- **Reproducibility**: System context for experiment tracking\n",
"- **Debugging**: Hardware info for performance troubleshooting\n",
"- **Collaboration**: Proper attribution and contact information\n",
"\n",
"#### 4. **Development Workflow**\n",
"- **NBGrader integration**: Automated testing and grading\n",
"- **Code export**: Functions become part of production package\n",
"- **Testing practices**: Comprehensive validation of functionality\n",
"\n",
"### Connections to Real ML Systems\n",
"\n",
"This module connects to broader ML engineering practices:\n",
"\n",
"#### **Industry Parallels**\n",
"- **Docker containers**: System configuration and reproducibility\n",
"- **MLflow tracking**: Experiment context and system metadata\n",
"- **Model cards**: Documentation of system requirements and performance\n",
"- **CI/CD pipelines**: Automated testing and environment validation\n",
"\n",
"#### **Production Considerations**\n",
"- **Deployment matching**: Development environment should match production\n",
"- **Resource planning**: Understanding hardware constraints for scaling\n",
"- **Monitoring**: System metrics for performance optimization\n",
"- **Debugging**: System context for troubleshooting issues\n",
"\n",
"### Next Steps in Your ML Systems Journey\n",
"\n",
"#### **Immediate Actions**\n",
"1. **Export your code**: `tito module export 01_setup`\n",
"2. **Test your installation**: \n",
" ```python\n",
" from tinytorch.core.setup import personal_info, system_info\n",
" print(personal_info()) # Your personal details\n",
" print(system_info()) # System information\n",
" ```\n",
"3. **Verify package integration**: Ensure your functions work in the tinytorch package\n",
"\n",
"#### **Looking Ahead**\n",
"- **Module 1 (Tensor)**: Build the fundamental data structure for ML\n",
"- **Module 2 (Activations)**: Add nonlinearity for complex learning\n",
"- **Module 3 (Layers)**: Create the building blocks of neural networks\n",
"- **Module 4 (Networks)**: Compose layers into powerful architectures\n",
"\n",
"#### **Course Progression**\n",
"You're now ready to build a complete ML system from scratch:\n",
"```\n",
"Setup → Tensor → Activations → Layers → Networks → CNN → DataLoader → \n",
"Autograd → Optimizers → Training → Compression → Kernels → Benchmarking → MLOps\n",
"```\n",
"\n",
"### Professional Development Milestone\n",
"\n",
"You've taken your first step in ML systems engineering! This module taught you:\n",
"- **System thinking**: Understanding hardware and software constraints\n",
"- **Professional practices**: Proper attribution, testing, and documentation\n",
"- **Tool mastery**: NBGrader workflow and package development\n",
"- **Foundation building**: Creating reusable, tested, documented code\n",
"\n",
"**Ready for the next challenge?** Let's build the foundation of ML systems with tensors!"
]
},
{
"cell_type": "markdown",
"id": "fdb8068c",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 4: Environment Validation\n",
"\n",
"### The Concept: Dependency Management in ML Systems\n",
"**Environment validation** ensures your system has the necessary packages and versions for ML development. This is crucial because ML systems have complex dependency chains that can break in subtle ways.\n",
"\n",
"### Why Environment Validation Matters\n",
"\n",
"#### 1. **Compatibility Assurance**\n",
"- **Version conflicts**: Different packages may require incompatible versions\n",
"- **API changes**: New versions might break existing code\n",
"- **Feature availability**: Some features require specific versions\n",
"\n",
"#### 2. **Reproducibility**\n",
"- **Environment documentation**: Exact package versions for reproduction\n",
"- **Dependency tracking**: Understanding what's installed and why\n",
"- **Debugging support**: Version info helps troubleshoot issues\n",
"\n",
"#### 3. **Professional Development**\n",
"- **Deployment safety**: Ensure development matches production\n",
"- **Collaboration**: Team members need compatible environments\n",
"- **Quality assurance**: Validate setup before beginning work\n",
"\n",
"### Essential ML Dependencies\n",
"We'll check for core packages that ML systems depend on:\n",
"- **numpy**: Fundamental numerical computing\n",
"- **matplotlib**: Visualization and plotting\n",
"- **psutil**: System information and monitoring\n",
"- **jupyter**: Interactive development environment\n",
"- **nbdev**: Package development tools\n",
"- **pytest**: Testing framework\n",
"\n",
"### Real-World Applications\n",
"- **Docker**: Container images include dependency validation\n",
"- **CI/CD**: Automated testing validates environment setup\n",
"- **MLflow**: Tracks package versions with experiment metadata\n",
"- **Kaggle**: Validates package availability in competition environments\n",
"\n",
"Let's implement environment validation!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e36a801",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "environment-validation",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"import importlib\n",
"import pkg_resources\n",
"from typing import Dict, List, Optional\n",
"\n",
"def validate_environment() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Validate ML development environment and check essential dependencies.\n",
" \n",
" This function checks that your system has the necessary packages for ML development.\n",
" It's like a pre-flight check before you start building ML systems.\n",
" \n",
" TODO: Implement environment validation.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Define list of essential ML packages to check\n",
" 2. For each package, try to import it and get version\n",
" 3. Track which packages are available vs missing\n",
" 4. Calculate environment health score\n",
" 5. Return comprehensive environment report\n",
" \n",
" ESSENTIAL PACKAGES TO CHECK:\n",
" - numpy: Numerical computing foundation\n",
" - matplotlib: Visualization and plotting\n",
" - psutil: System monitoring\n",
" - jupyter: Interactive development\n",
" - nbdev: Package development\n",
" - pytest: Testing framework\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Use try/except to handle missing packages gracefully\n",
" - Use pkg_resources.get_distribution(package).version for versions\n",
" - Calculate health_score as (available_packages / total_packages) * 100\n",
" - Round health_score to 1 decimal place\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" essential_packages = [\n",
" 'numpy', 'matplotlib', 'psutil', 'jupyter', 'nbdev', 'pytest'\n",
" ]\n",
" \n",
" available = {}\n",
" missing = []\n",
" \n",
" for package in essential_packages:\n",
" try:\n",
" # Try to import the package\n",
" importlib.import_module(package)\n",
" # Get version information\n",
" version = pkg_resources.get_distribution(package).version\n",
" available[package] = version\n",
" except (ImportError, pkg_resources.DistributionNotFound):\n",
" missing.append(package)\n",
" \n",
" # Calculate health score\n",
" total_packages = len(essential_packages)\n",
" available_packages = len(available)\n",
" health_score = round((available_packages / total_packages) * 100, 1)\n",
" \n",
" return {\n",
" 'available_packages': available,\n",
" 'missing_packages': missing,\n",
" 'health_score': health_score,\n",
" 'total_checked': total_packages,\n",
" 'status': 'healthy' if health_score >= 80 else 'needs_attention'\n",
" }\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "4547fb8d",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 5: Performance Benchmarking\n",
"\n",
"### The Concept: Hardware Performance Profiling\n",
"**Performance benchmarking** measures your system's computational capabilities for ML workloads. This helps you understand your hardware limits and optimize your development workflow.\n",
"\n",
"### Why Performance Benchmarking Matters\n",
"\n",
"#### 1. **Resource Planning**\n",
"- **Training time estimation**: How long will model training take?\n",
"- **Memory allocation**: What's the maximum batch size you can handle?\n",
"- **Parallelization**: How many cores can you effectively use?\n",
"\n",
"#### 2. **Optimization Guidance**\n",
"- **Bottleneck identification**: Is your system CPU-bound or memory-bound?\n",
"- **Hardware upgrades**: What would improve performance most?\n",
"- **Algorithm selection**: Which algorithms suit your hardware?\n",
"\n",
"#### 3. **Performance Comparison**\n",
"- **Baseline establishment**: Track performance over time\n",
"- **System comparison**: Compare different development environments\n",
"- **Deployment planning**: Match development to production performance\n",
"\n",
"### Benchmarking Strategy\n",
"We'll test key ML operations:\n",
"- **CPU computation**: Matrix operations that stress the processor\n",
"- **Memory bandwidth**: Large data transfers that test memory speed\n",
"- **Overall system**: Combined CPU and memory performance\n",
"\n",
"### Real-World Applications\n",
"- **MLPerf**: Industry-standard ML benchmarks\n",
"- **Cloud providers**: Performance metrics for instance selection\n",
"- **Hardware vendors**: Benchmark comparisons for purchasing decisions\n",
"\n",
"Let's implement performance benchmarking!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c80ba038",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "performance-benchmark",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"import time\n",
"import random\n",
"\n",
"def benchmark_performance() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Benchmark system performance for ML workloads.\n",
" \n",
" This function measures computational performance to help you understand\n",
" your system's capabilities and optimize your ML development workflow.\n",
" \n",
" TODO: Implement performance benchmarking.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. CPU Test: Time a computationally intensive operation\n",
" 2. Memory Test: Time a memory-intensive operation\n",
" 3. Calculate performance scores based on execution time\n",
" 4. Determine overall system performance rating\n",
" 5. Return comprehensive benchmark results\n",
" \n",
" BENCHMARK TESTS:\n",
" - CPU: Nested loop calculation (computational intensity)\n",
" - Memory: Large list operations (memory bandwidth)\n",
" - Combined: Overall system performance score\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Use time.time() to measure execution time\n",
" - CPU test: nested loops with mathematical operations\n",
" - Memory test: large list creation and manipulation\n",
" - Lower execution time = better performance\n",
" - Calculate scores as inverse of time (e.g., 1/time * 1000)\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" benchmarks = {}\n",
" \n",
" # CPU Performance Test\n",
" print(\"⚡ Running CPU benchmark...\")\n",
" start_time = time.time()\n",
" \n",
" # CPU-intensive calculation\n",
" result = 0\n",
" for i in range(100000):\n",
" result += i * i + i / 2\n",
" \n",
" cpu_time = time.time() - start_time\n",
" benchmarks['cpu_time'] = round(cpu_time, 3)\n",
" benchmarks['cpu_score'] = round(1000 / cpu_time, 1)\n",
" \n",
" # Memory Performance Test\n",
" print(\"🧠 Running memory benchmark...\")\n",
" start_time = time.time()\n",
" \n",
" # Memory-intensive operations\n",
" large_list = list(range(1000000))\n",
" large_list.reverse()\n",
" large_list.sort()\n",
" \n",
" memory_time = time.time() - start_time\n",
" benchmarks['memory_time'] = round(memory_time, 3)\n",
" benchmarks['memory_score'] = round(1000 / memory_time, 1)\n",
" \n",
" # Overall Performance Score\n",
" overall_score = round((benchmarks['cpu_score'] + benchmarks['memory_score']) / 2, 1)\n",
" benchmarks['overall_score'] = overall_score\n",
" \n",
" # Performance Rating\n",
" if overall_score >= 80:\n",
" rating = 'excellent'\n",
" elif overall_score >= 60:\n",
" rating = 'good'\n",
" elif overall_score >= 40:\n",
" rating = 'fair'\n",
" else:\n",
" rating = 'needs_optimization'\n",
" \n",
" benchmarks['performance_rating'] = rating\n",
" \n",
" return benchmarks\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "666b386a",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 6: Development Environment Setup\n",
"\n",
"### The Concept: Professional Development Configuration\n",
"**Development environment setup** configures essential tools and settings for professional ML development. This includes Git configuration, Jupyter settings, and other tools that make development more efficient.\n",
"\n",
"### Why Development Setup Matters\n",
"\n",
"#### 1. **Professional Standards**\n",
"- **Version control**: Proper Git configuration for collaboration\n",
"- **Code quality**: Consistent formatting and style\n",
"- **Documentation**: Automatic documentation generation\n",
"\n",
"#### 2. **Productivity Optimization**\n",
"- **Tool configuration**: Optimized settings for efficiency\n",
"- **Workflow automation**: Reduce repetitive tasks\n",
"- **Error prevention**: Catch issues before they become problems\n",
"\n",
"#### 3. **Collaboration Readiness**\n",
"- **Team compatibility**: Consistent development environment\n",
"- **Code sharing**: Proper attribution and commit messages\n",
"- **Project standards**: Follow established conventions\n",
"\n",
"### Essential Development Tools\n",
"We'll configure key tools for ML development:\n",
"- **Git**: Version control and collaboration\n",
"- **Jupyter**: Interactive development environment\n",
"- **Python**: Code formatting and quality tools\n",
"\n",
"Let's implement development environment setup!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a34ebb28",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "development-setup",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"import subprocess\n",
"import json\n",
"from pathlib import Path\n",
"\n",
"def setup_development_environment() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Configure development environment for professional ML development.\n",
" \n",
" This function sets up essential tools and configurations to make your\n",
" development workflow more efficient and professional.\n",
" \n",
" TODO: Implement development environment setup.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Check if Git is installed and configured\n",
" 2. Verify Jupyter installation and configuration\n",
" 3. Check Python development tools\n",
" 4. Configure any missing tools\n",
" 5. Return setup status and recommendations\n",
" \n",
" DEVELOPMENT TOOLS TO CHECK:\n",
" - Git: Version control system\n",
" - Jupyter: Interactive development\n",
" - Python tools: Code quality and formatting\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Use subprocess.run() to check tool availability\n",
" - Use try/except to handle missing tools gracefully\n",
" - Provide helpful recommendations for missing tools\n",
" - Focus on tools that improve ML development workflow\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" setup_status = {}\n",
" recommendations = []\n",
" \n",
" # Check Git installation and configuration\n",
" try:\n",
" git_version = subprocess.run(['git', '--version'], \n",
" capture_output=True, text=True, check=True)\n",
" setup_status['git_installed'] = True\n",
" setup_status['git_version'] = git_version.stdout.strip()\n",
" \n",
" # Check Git configuration\n",
" try:\n",
" git_name = subprocess.run(['git', 'config', 'user.name'], \n",
" capture_output=True, text=True, check=True)\n",
" git_email = subprocess.run(['git', 'config', 'user.email'], \n",
" capture_output=True, text=True, check=True)\n",
" setup_status['git_configured'] = True\n",
" setup_status['git_name'] = git_name.stdout.strip()\n",
" setup_status['git_email'] = git_email.stdout.strip()\n",
" except subprocess.CalledProcessError:\n",
" setup_status['git_configured'] = False\n",
" recommendations.append(\"Configure Git: git config --global user.name 'Your Name'\")\n",
" recommendations.append(\"Configure Git: git config --global user.email 'your.email@domain.com'\")\n",
" \n",
" except (subprocess.CalledProcessError, FileNotFoundError):\n",
" setup_status['git_installed'] = False\n",
" recommendations.append(\"Install Git: https://git-scm.com/downloads\")\n",
" \n",
" # Check Jupyter installation\n",
" try:\n",
" jupyter_version = subprocess.run(['jupyter', '--version'], \n",
" capture_output=True, text=True, check=True)\n",
" setup_status['jupyter_installed'] = True\n",
" setup_status['jupyter_version'] = jupyter_version.stdout.strip()\n",
" except (subprocess.CalledProcessError, FileNotFoundError):\n",
" setup_status['jupyter_installed'] = False\n",
" recommendations.append(\"Install Jupyter: pip install jupyter\")\n",
" \n",
" # Check Python tools\n",
" python_tools = ['pip', 'python']\n",
" for tool in python_tools:\n",
" try:\n",
" tool_version = subprocess.run([tool, '--version'], \n",
" capture_output=True, text=True, check=True)\n",
" setup_status[f'{tool}_installed'] = True\n",
" setup_status[f'{tool}_version'] = tool_version.stdout.strip()\n",
" except (subprocess.CalledProcessError, FileNotFoundError):\n",
" setup_status[f'{tool}_installed'] = False\n",
" recommendations.append(f\"Install {tool}: Check Python installation\")\n",
" \n",
" # Calculate setup health\n",
" total_tools = 4 # git, jupyter, pip, python\n",
" installed_tools = sum([\n",
" setup_status.get('git_installed', False),\n",
" setup_status.get('jupyter_installed', False),\n",
" setup_status.get('pip_installed', False),\n",
" setup_status.get('python_installed', False)\n",
" ])\n",
" \n",
" setup_score = round((installed_tools / total_tools) * 100, 1)\n",
" \n",
" return {\n",
" 'setup_status': setup_status,\n",
" 'recommendations': recommendations,\n",
" 'setup_score': setup_score,\n",
" 'status': 'ready' if setup_score >= 75 else 'needs_configuration'\n",
" }\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "c27d83df",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 7: Comprehensive System Report\n",
"\n",
"### The Concept: Integrated System Analysis\n",
"**Comprehensive system reporting** combines all your configuration and diagnostic information into a single, actionable report. This is like a \"health check\" for your ML development environment.\n",
"\n",
"### Why Comprehensive Reporting Matters\n",
"\n",
"#### 1. **Holistic View**\n",
"- **Complete picture**: All system information in one place\n",
"- **Dependency analysis**: How different components interact\n",
"- **Performance context**: Understanding system capabilities\n",
"\n",
"#### 2. **Troubleshooting Support**\n",
"- **Debugging aid**: Complete environment information for issue resolution\n",
"- **Performance analysis**: Identify bottlenecks and optimization opportunities\n",
"- **Compatibility checking**: Ensure all components work together\n",
"\n",
"#### 3. **Professional Documentation**\n",
"- **Environment documentation**: Complete system specification\n",
"- **Reproducibility**: All information needed to recreate environment\n",
"- **Sharing**: Easy to share system information with collaborators\n",
"\n",
"Let's create a comprehensive system report!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89b9aac3",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "system-report",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"from datetime import datetime\n",
"\n",
"def generate_system_report() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Generate comprehensive system report for ML development.\n",
" \n",
" This function combines all configuration and diagnostic information\n",
" into a single, actionable report for your ML development environment.\n",
" \n",
" TODO: Implement comprehensive system reporting.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Gather personal information\n",
" 2. Collect system information\n",
" 3. Validate environment\n",
" 4. Run performance benchmarks\n",
" 5. Check development setup\n",
" 6. Generate overall health score\n",
" 7. Create comprehensive report with recommendations\n",
" \n",
" REPORT SECTIONS:\n",
" - Personal configuration\n",
" - System specifications\n",
" - Environment validation\n",
" - Performance benchmarks\n",
" - Development setup\n",
" - Overall health assessment\n",
" - Recommendations for improvement\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Call all previously implemented functions\n",
" - Combine results into comprehensive report\n",
" - Calculate overall health score from all components\n",
" - Provide actionable recommendations\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" print(\"📊 Generating comprehensive system report...\")\n",
" \n",
" # Gather all information\n",
" personal = personal_info()\n",
" system = system_info()\n",
" environment = validate_environment()\n",
" performance = benchmark_performance()\n",
" development = setup_development_environment()\n",
" \n",
" # Calculate overall health score (normalize performance score to 0-100 range)\n",
" normalized_performance = min(performance['overall_score'], 100) # Cap at 100\n",
" \n",
" health_components = [\n",
" environment['health_score'],\n",
" normalized_performance,\n",
" development['setup_score']\n",
" ]\n",
" \n",
" overall_health = round(sum(health_components) / len(health_components), 1)\n",
" \n",
" # Generate status\n",
" if overall_health >= 85:\n",
" status = 'excellent'\n",
" elif overall_health >= 70:\n",
" status = 'good'\n",
" elif overall_health >= 50:\n",
" status = 'fair'\n",
" else:\n",
" status = 'needs_attention'\n",
" \n",
" # Compile recommendations\n",
" recommendations = []\n",
" \n",
" if environment['health_score'] < 80:\n",
" recommendations.extend([f\"Install missing package: {pkg}\" for pkg in environment['missing_packages']])\n",
" \n",
" if performance['overall_score'] < 50:\n",
" recommendations.append(\"Consider hardware upgrade for better ML performance\")\n",
" \n",
" recommendations.extend(development['recommendations'])\n",
" \n",
" # Create comprehensive report\n",
" report = {\n",
" 'timestamp': datetime.now().isoformat(),\n",
" 'personal_info': personal,\n",
" 'system_info': system,\n",
" 'environment_validation': environment,\n",
" 'performance_benchmarks': performance,\n",
" 'development_setup': development,\n",
" 'overall_health': overall_health,\n",
" 'status': status,\n",
" 'recommendations': recommendations,\n",
" 'report_version': '1.0.0'\n",
" }\n",
" \n",
" return report\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "9063a17e",
"metadata": {},
"source": [
"\"\"\"\n",
"## 🧪 Unit Test: Enhanced Setup Functions\n",
"\n",
"Test all the new enhanced setup functions:\n",
"\"\"\"\n",
"\n",
"Old function removed - using shared test runner pattern"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b48e976",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def test_performance_benchmark():\n",
" \"\"\"Test performance benchmarking function.\"\"\"\n",
" print(\"🔬 Unit Test: Performance Benchmarking...\")\n",
" \n",
" benchmark_report = benchmark_performance()\n",
" \n",
" # Test return type and structure\n",
" assert isinstance(benchmark_report, dict), \"benchmark_performance should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['cpu_time', 'cpu_score', 'memory_time', 'memory_score', 'overall_score', 'performance_rating']\n",
" for key in required_keys:\n",
" assert key in benchmark_report, f\"Report should have '{key}' key\"\n",
" \n",
" # Test data types\n",
" assert isinstance(benchmark_report['cpu_time'], (int, float)), \"cpu_time should be number\"\n",
" assert isinstance(benchmark_report['cpu_score'], (int, float)), \"cpu_score should be number\"\n",
" assert isinstance(benchmark_report['memory_time'], (int, float)), \"memory_time should be number\"\n",
" assert isinstance(benchmark_report['memory_score'], (int, float)), \"memory_score should be number\"\n",
" assert isinstance(benchmark_report['overall_score'], (int, float)), \"overall_score should be number\"\n",
" assert isinstance(benchmark_report['performance_rating'], str), \"performance_rating should be string\"\n",
" \n",
" # Test reasonable values\n",
" assert benchmark_report['cpu_time'] > 0, \"cpu_time should be positive\"\n",
" assert benchmark_report['memory_time'] > 0, \"memory_time should be positive\"\n",
" assert benchmark_report['cpu_score'] > 0, \"cpu_score should be positive\"\n",
" assert benchmark_report['memory_score'] > 0, \"memory_score should be positive\"\n",
" assert benchmark_report['overall_score'] > 0, \"overall_score should be positive\"\n",
" \n",
" valid_ratings = ['excellent', 'good', 'fair', 'needs_optimization']\n",
" assert benchmark_report['performance_rating'] in valid_ratings, \"performance_rating should be valid\"\n",
" \n",
" print(\"✅ Performance benchmark tests passed!\")\n",
" print(f\"✅ Performance rating: {benchmark_report['performance_rating']}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b09b6ad",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def test_development_setup():\n",
" \"\"\"Test development environment setup function.\"\"\"\n",
" print(\"🔬 Unit Test: Development Environment Setup...\")\n",
" \n",
" setup_report = setup_development_environment()\n",
" \n",
" # Test return type and structure\n",
" assert isinstance(setup_report, dict), \"setup_development_environment should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['setup_status', 'recommendations', 'setup_score', 'status']\n",
" for key in required_keys:\n",
" assert key in setup_report, f\"Report should have '{key}' key\"\n",
" \n",
" # Test data types\n",
" assert isinstance(setup_report['setup_status'], dict), \"setup_status should be dict\"\n",
" assert isinstance(setup_report['recommendations'], list), \"recommendations should be list\"\n",
" assert isinstance(setup_report['setup_score'], (int, float)), \"setup_score should be number\"\n",
" assert isinstance(setup_report['status'], str), \"status should be string\"\n",
" \n",
" # Test reasonable values\n",
" assert 0 <= setup_report['setup_score'] <= 100, \"setup_score should be between 0 and 100\"\n",
" assert setup_report['status'] in ['ready', 'needs_configuration'], \"status should be valid\"\n",
" \n",
" print(\"✅ Development setup tests passed!\")\n",
" print(f\"✅ Setup score: {setup_report['setup_score']}%\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68475c70",
"metadata": {},
"outputs": [],
"source": [
"def test_system_report():\n",
" \"\"\"Test comprehensive system report function.\"\"\"\n",
" print(\"🔬 Unit Test: System Report Generation...\")\n",
" \n",
" report = generate_system_report()\n",
" \n",
" # Test return type and structure\n",
" assert isinstance(report, dict), \"generate_system_report should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['timestamp', 'personal_info', 'system_info', 'environment_validation', \n",
" 'performance_benchmarks', 'development_setup', 'overall_health', \n",
" 'status', 'recommendations', 'report_version']\n",
" for key in required_keys:\n",
" assert key in report, f\"Report should have '{key}' key\"\n",
" \n",
" # Test data types\n",
" assert isinstance(report['timestamp'], str), \"timestamp should be string\"\n",
" assert isinstance(report['personal_info'], dict), \"personal_info should be dict\"\n",
" assert isinstance(report['system_info'], dict), \"system_info should be dict\"\n",
" assert isinstance(report['environment_validation'], dict), \"environment_validation should be dict\"\n",
" assert isinstance(report['performance_benchmarks'], dict), \"performance_benchmarks should be dict\"\n",
" assert isinstance(report['development_setup'], dict), \"development_setup should be dict\"\n",
" assert isinstance(report['overall_health'], (int, float)), \"overall_health should be number\"\n",
" assert isinstance(report['status'], str), \"status should be string\"\n",
" assert isinstance(report['recommendations'], list), \"recommendations should be list\"\n",
" assert isinstance(report['report_version'], str), \"report_version should be string\"\n",
" \n",
" # Test reasonable values\n",
" assert 0 <= report['overall_health'] <= 100, \"overall_health should be between 0 and 100\"\n",
" valid_statuses = ['excellent', 'good', 'fair', 'needs_attention']\n",
" assert report['status'] in valid_statuses, \"status should be valid\"\n",
" \n",
" print(\"✅ System report tests passed!\")\n",
" print(f\"✅ Overall system health: {report['overall_health']}%\")\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba1bcd18",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def test_personal_info():\n",
" \"\"\"Test personal information function comprehensively.\"\"\"\n",
" personal = personal_info()\n",
" assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
" assert 'developer' in personal, \"Dictionary should have 'developer' key\"\n",
" assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
" print(\"✅ Personal information function works\")\n",
"\n",
"def test_system_info():\n",
" \"\"\"Test system information function comprehensively.\"\"\"\n",
" system = system_info()\n",
" assert isinstance(system, dict), \"system_info should return a dictionary\"\n",
" assert 'python_version' in system, \"Dictionary should have 'python_version' key\"\n",
" assert system['memory_gb'] > 0, \"Memory should be positive\"\n",
" print(\"✅ System information function works\")\n",
"\n",
"def test_environment_validation():\n",
" \"\"\"Test environment validation function comprehensively.\"\"\"\n",
" env = validate_environment()\n",
" assert isinstance(env, dict), \"validate_environment should return a dictionary\"\n",
" assert 'health_score' in env, \"Dictionary should have 'health_score' key\"\n",
" print(\"✅ Environment validation function works\")\n",
"\n",
"def test_performance_benchmark():\n",
" \"\"\"Test performance benchmarking function comprehensively.\"\"\"\n",
" perf = benchmark_performance()\n",
" assert isinstance(perf, dict), \"benchmark_performance should return a dictionary\"\n",
" assert 'cpu_score' in perf, \"Dictionary should have 'cpu_score' key\"\n",
" print(\"✅ Performance benchmarking function works\")\n",
"\n",
"def test_development_setup():\n",
" \"\"\"Test development setup function comprehensively.\"\"\"\n",
" dev = setup_development_environment()\n",
" assert isinstance(dev, dict), \"setup_development_environment should return a dictionary\"\n",
" assert 'setup_score' in dev, \"Dictionary should have 'setup_score' key\"\n",
" print(\"✅ Development setup function works\")\n",
"\n",
"def test_system_report():\n",
" \"\"\"Test system report comprehensive function.\"\"\"\n",
" report = generate_system_report()\n",
" assert isinstance(report, dict), \"generate_system_report should return a dictionary\"\n",
" assert 'overall_health' in report, \"Dictionary should have 'overall_health' key\"\n",
" print(\"✅ System report function works\")"
]
},
{
"cell_type": "markdown",
"id": "2415d2ab",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🧪 Module Testing\n",
"\n",
"Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
"\n",
"**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "526c9009",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "standardized-testing",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# =============================================================================\n",
"# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
"# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
"# =============================================================================\n",
"\n",
"if __name__ == \"__main__\":\n",
" from tito.tools.testing import run_module_tests_auto\n",
" \n",
" # Automatically discover and run all tests in this module\n",
" success = run_module_tests_auto(\"Setup\")"
]
},
{
"cell_type": "markdown",
"id": "35feea10",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🎯 Module Summary: Development Environment Setup Complete!\n",
"\n",
"Congratulations! You've successfully set up your TinyTorch development environment:\n",
"\n",
"### What You've Accomplished\n",
"✅ **Personal Configuration**: Developer information and preferences\n",
"✅ **System Analysis**: Hardware and software environment validation\n",
"✅ **Environment Validation**: Python packages and dependencies\n",
"✅ **Performance Benchmarking**: CPU and memory performance testing\n",
"✅ **Development Setup**: IDE configuration and tooling\n",
"✅ **Comprehensive Reporting**: System health and recommendations\n",
"\n",
"### Key Concepts You've Learned\n",
"- **Environment Management**: How to validate and configure development environments\n",
"- **Performance Analysis**: Benchmarking system capabilities for ML workloads\n",
"- **System Diagnostics**: Comprehensive health checking and reporting\n",
"- **Development Best Practices**: Professional setup for ML development\n",
"\n",
"### Next Steps\n",
"1. **Export your code**: `tito package nbdev --export 00_setup`\n",
"2. **Test your implementation**: `tito test 00_setup`\n",
"3. **Use your environment**: Start building with confidence in a validated setup\n",
"4. **Move to Module 1**: Begin implementing the core tensor system!\n",
"\n",
"**Ready for the ML journey?** Your development environment is now optimized for building neural networks from scratch!"
]
}
],
"metadata": {
"jupytext": {
"main_language": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}