diff --git a/NBGRADER_STYLE_GUIDE.md b/NBGRADER_STYLE_GUIDE.md
new file mode 100644
index 00000000..cf43ba18
--- /dev/null
+++ b/NBGRADER_STYLE_GUIDE.md
@@ -0,0 +1,256 @@
+# TinyTorch NBGrader Style Guide
+
+## Purpose
+This guide establishes the standard format for all NBGrader solution blocks across TinyTorch modules to ensure consistency and maximize educational value.
+
+## Standard Solution Block Format
+
+```python
+def function_name(self, parameters):
+    """
+    Brief function description (1-2 sentences).
+    
+    Args:
+        param1: Parameter description
+        param2: Parameter description
+    
+    Returns:
+        Return type and description
+    
+    TODO: Implement [specific task] with [key requirements].
+    
+    STEP-BY-STEP IMPLEMENTATION:
+    1. [Action verb] [specific task] - [brief explanation]
+    2. [Action verb] [specific task] - [brief explanation]  
+    3. [Action verb] [specific task] - [brief explanation]
+    4. [Action verb] [specific task] - [brief explanation]
+    
+    EXAMPLE USAGE:
+    ```python
+    # Realistic example with clear input/output
+    input_data = ClassName(example_data)
+    result = function_name(input_data, parameters)
+    print(result)  # Expected: [specific output]
+    ```
+    
+    IMPLEMENTATION HINTS:
+    - Use [specific function/method] for [specific purpose]
+    - Handle [edge case] by [specific approach]
+    - Remember to [critical requirement]
+    - Common error: [specific mistake to avoid]
+    
+    LEARNING CONNECTIONS:
+    - This is equivalent to [PyTorch/TensorFlow function]
+    - Used in [real-world application/system]
+    - Foundation for [advanced concept]
+    - Enables [specific capability]
+    """
+    ### BEGIN SOLUTION
+    # Implementation code (typically 3-10 lines)
+    # Focus on clarity and correctness
+    # Follow the steps outlined above
+    ### END SOLUTION
+```
+
+## Required Sections
+
+### 1. TODO
+- **Purpose**: Clear task description
+- **Format**: `TODO: Implement [specific task] with [key requirements].`
+- **Example**: `TODO: Implement forward pass for ReLU activation with proper handling of negative values.`
+
+### 2. STEP-BY-STEP IMPLEMENTATION
+- **Purpose**: Guide implementation approach
+- **Format**: Numbered list with action verbs
+- **Guidelines**:
+  - Start each step with an action verb (Create, Calculate, Apply, Return)
+  - Include brief explanation after dash
+  - Keep to 3-5 steps for later modules, 5-7 for early modules
+- **Example**:
+  ```
+  1. Check input dimensions - ensure tensor is valid
+  2. Apply element-wise maximum - compare with zero
+  3. Return activated tensor - maintain original shape
+  ```
+
+### 3. EXAMPLE USAGE
+- **Purpose**: Demonstrate correct usage
+- **Format**: Python code block with comments
+- **Must Include**:
+  - Realistic input data
+  - Function call with proper parameters
+  - Expected output with comment
+- **Example**:
+  ```python
+  # Create sample input
+  x = Tensor([[-1, 0, 2], [3, -4, 5]])
+  relu = ReLU()
+  output = relu(x)
+  print(output)  # Expected: [[0, 0, 2], [3, 0, 5]]
+  ```
+
+### 4. IMPLEMENTATION HINTS
+- **Purpose**: Technical guidance and common pitfalls
+- **Format**: Bulleted list
+- **Should Include**:
+  - Specific functions/methods to use
+  - Edge cases to handle
+  - Common errors to avoid
+  - Performance considerations (for later modules)
+- **Example**:
+  ```
+  - Use np.maximum() for element-wise comparison
+  - Handle None inputs gracefully
+  - Remember to preserve input shape
+  - Common error: forgetting to handle batch dimensions
+  ```
+
+### 5. LEARNING CONNECTIONS
+- **Purpose**: Connect to real-world ML systems
+- **Format**: Bulleted list
+- **Should Include**:
+  - Framework equivalents (PyTorch/TensorFlow)
+  - Real-world applications
+  - Connection to other modules
+  - Why this implementation matters
+- **Example**:
+  ```
+  - This is equivalent to torch.nn.ReLU() in PyTorch
+  - Used in every modern neural network architecture
+  - Foundation for understanding gradient flow
+  - Enables training deep networks without vanishing gradients
+  ```
+
+## Optional Enhancement Sections
+
+### VISUAL STEP-BY-STEP (Early modules)
+- **When to Use**: Complex mathematical operations or data flow
+- **Format**: ASCII diagrams or visual explanations
+- **Example**:
+  ```
+  Input: [1, -2, 3, -4, 5]
+           ↓ ReLU
+  Output: [1, 0, 3, 0, 5]
+  ```
+
+### DEBUGGING HINTS (When helpful)
+- **When to Use**: Functions with common implementation errors
+- **Format**: Specific debugging strategies
+- **Example**:
+  ```
+  - Print shapes at each step to verify dimensions
+  - Check for NaN values after operations
+  - Verify gradient flow in backward pass
+  ```
+
+### MATHEMATICAL FOUNDATION (Math-heavy modules)
+- **When to Use**: Complex mathematical operations
+- **Format**: LaTeX-style equations with explanations
+- **Example**:
+  ```
+  Softmax formula: softmax(x_i) = exp(x_i) / Σ(exp(x_j))
+  ```
+
+## Module-Specific Guidelines
+
+### Early Modules (01-07): Foundation & Architecture
+- More detailed STEP-BY-STEP (5-7 steps)
+- Include VISUAL STEP-BY-STEP where helpful
+- Focus on educational clarity
+- Simpler EXAMPLE USAGE
+
+### Middle Modules (08-11): Training
+- Balance detail with conciseness (4-5 steps)
+- Include gradient flow considerations
+- Real dataset examples
+- Performance hints become important
+
+### Later Modules (12-16): Production
+- Concise STEP-BY-STEP (3-5 steps)
+- Production-focused IMPLEMENTATION HINTS
+- Complex, real-world EXAMPLE USAGE
+- Strong emphasis on LEARNING CONNECTIONS to industry
+
+## Quality Checklist
+
+Before finalizing any solution block, verify:
+
+- [ ] TODO clearly states the task
+- [ ] STEP-BY-STEP has numbered action steps
+- [ ] EXAMPLE USAGE has realistic code with expected output
+- [ ] IMPLEMENTATION HINTS cover key technical points
+- [ ] LEARNING CONNECTIONS link to real ML systems
+- [ ] Solution code follows the outlined steps
+- [ ] All code is tested and working
+- [ ] Docstring has proper Args/Returns sections
+
+## Common Mistakes to Avoid
+
+1. **Inconsistent section names**: Always use exact section headers
+2. **Missing expected output**: Every example needs `# Expected:` comment
+3. **Too vague TODOs**: Be specific about requirements
+4. **Untested examples**: All example code must actually work
+5. **Missing Learning Connections**: Always connect to real-world ML
+
+## Example: Well-Formatted Solution Block
+
+```python
+def softmax(self, x: np.ndarray, axis: int = -1) -> np.ndarray:
+    """
+    Apply softmax activation function along specified axis.
+    
+    Args:
+        x: Input array of any shape
+        axis: Axis along which to apply softmax (default: -1)
+    
+    Returns:
+        Array with same shape as input with softmax applied
+    
+    TODO: Implement numerically stable softmax with overflow protection.
+    
+    STEP-BY-STEP IMPLEMENTATION:
+    1. Subtract maximum value - prevent overflow in exponential
+    2. Compute exponentials - apply exp() to shifted values
+    3. Sum exponentials - calculate normalization factor
+    4. Divide by sum - normalize to get probabilities
+    
+    EXAMPLE USAGE:
+    ```python
+    logits = np.array([[2.0, 1.0, 0.1], [1.0, 3.0, 0.2]])
+    probs = softmax(logits)
+    print(probs.sum(axis=1))  # Expected: [1.0, 1.0]
+    print(probs[0])  # Expected: [0.659, 0.242, 0.099] (approx)
+    ```
+    
+    IMPLEMENTATION HINTS:
+    - Use x.max(axis=axis, keepdims=True) for stable computation
+    - Apply np.exp() after shifting by maximum
+    - Use keepdims=True to maintain broadcasting shape
+    - Common error: forgetting to handle arbitrary axis parameter
+    
+    LEARNING CONNECTIONS:
+    - This is equivalent to torch.nn.functional.softmax() in PyTorch
+    - Critical for multi-class classification in final layers
+    - Used in attention mechanisms for weight normalization
+    - Foundation for cross-entropy loss computation
+    """
+    ### BEGIN SOLUTION
+    x_max = x.max(axis=axis, keepdims=True)
+    x_shifted = x - x_max
+    exp_x = np.exp(x_shifted)
+    sum_exp = exp_x.sum(axis=axis, keepdims=True)
+    return exp_x / sum_exp
+    ### END SOLUTION
+```
+
+## Enforcement
+
+1. All new modules MUST follow this style guide
+2. Existing modules should be updated when modified
+3. Use this guide for code reviews
+4. Include compliance in module testing
+
+---
+
+*Last Updated: [Current Date]*
+*Version: 1.0*
\ No newline at end of file
diff --git a/check_compliance.py b/check_compliance.py
new file mode 100644
index 00000000..c530e426
--- /dev/null
+++ b/check_compliance.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+"""Check NBGrader style guide compliance across all modules."""
+
+import os
+import re
+from pathlib import Path
+
+def analyze_module_compliance(filepath):
+    with open(filepath, 'r') as f:
+        content = f.read()
+    
+    # Count solution blocks
+    solution_blocks = len(re.findall(r'### BEGIN SOLUTION', content))
+    
+    # Check for required sections
+    has_todo = 'TODO:' in content
+    has_step_by_step = 'STEP-BY-STEP IMPLEMENTATION:' in content
+    has_example_usage = 'EXAMPLE USAGE:' in content or 'EXAMPLE:' in content
+    has_hints = 'IMPLEMENTATION HINTS:' in content or 'HINTS:' in content
+    has_connections = 'LEARNING CONNECTIONS:' in content or 'LEARNING CONNECTION:' in content
+    
+    # Check for alternative patterns (older style)
+    has_approach = 'APPROACH:' in content
+    has_your_code_here = 'YOUR CODE HERE' in content
+    has_raise_notimpl = 'raise NotImplementedError' in content
+    
+    compliance_score = sum([has_todo, has_step_by_step, has_example_usage, has_hints, has_connections])
+    
+    return {
+        'solution_blocks': solution_blocks,
+        'compliance_score': compliance_score,
+        'has_todo': has_todo,
+        'has_step_by_step': has_step_by_step,
+        'has_example_usage': has_example_usage,
+        'has_hints': has_hints,
+        'has_connections': has_connections,
+        'has_old_patterns': has_approach or has_your_code_here or has_raise_notimpl
+    }
+
+# Analyze all modules
+modules_dir = Path('modules/source')
+results = {}
+
+for module_dir in sorted(modules_dir.iterdir()):
+    if module_dir.is_dir() and module_dir.name != 'utils':
+        py_files = list(module_dir.glob('*_dev.py'))
+        if py_files:
+            module_file = py_files[0]
+            results[module_dir.name] = analyze_module_compliance(module_file)
+
+# Report results
+print('=== NBGrader Style Guide Compliance Report ===\n')
+print('Module            | Blocks | Score | TODO | STEP | EXAM | HINT | CONN | Old? |')
+print('-' * 78)
+
+for module_name in sorted(results.keys()):
+    r = results[module_name]
+    status_emoji = '✅' if r['compliance_score'] == 5 else '⚠️' if r['compliance_score'] >= 3 else '❌'
+    
+    print(f"{module_name:16} | {r['solution_blocks']:6} | {status_emoji} {r['compliance_score']}/5 | "
+          f"{'✓' if r['has_todo'] else '✗':^4} | "
+          f"{'✓' if r['has_step_by_step'] else '✗':^4} | "
+          f"{'✓' if r['has_example_usage'] else '✗':^4} | "
+          f"{'✓' if r['has_hints'] else '✗':^4} | "
+          f"{'✓' if r['has_connections'] else '✗':^4} | "
+          f"{'⚠️' if r['has_old_patterns'] else '✓':^4} |")
+
+# Summary
+fully_compliant = sum(1 for r in results.values() if r['compliance_score'] == 5)
+needs_update = sum(1 for r in results.values() if r['compliance_score'] < 5)
+has_old_patterns = sum(1 for r in results.values() if r['has_old_patterns'])
+
+print('\n=== Summary ===')
+print(f'Fully Compliant: {fully_compliant}/{len(results)}')
+print(f'Needs Update: {needs_update}/{len(results)}')
+print(f'Has Old Patterns: {has_old_patterns}/{len(results)}')
+
+# List modules needing updates
+print('\n=== Modules Needing Updates ===')
+for module_name, r in sorted(results.items()):
+    if r['compliance_score'] < 5:
+        missing = []
+        if not r['has_todo']: missing.append('TODO')
+        if not r['has_step_by_step']: missing.append('STEP-BY-STEP')
+        if not r['has_example_usage']: missing.append('EXAMPLE USAGE')
+        if not r['has_hints']: missing.append('HINTS')
+        if not r['has_connections']: missing.append('CONNECTIONS')
+        print(f"{module_name}: Missing {', '.join(missing)}")
\ No newline at end of file