Remove ML Systems Thinking sections from all modules

Cleaned up module structure by removing reflection questions: - Updated module-developer.md to remove ML Systems Thinking from template - Removed ML Systems Thinking sections from all 9 modules: * Module 01 (Tensor): Removed 113 lines of questions * Module 02 (Activations): Removed 24 lines of questions * Module 03 (Layers): Removed 84 lines of questions * Module 04 (Losses): Removed 93 lines of questions * Module 05 (Autograd): Removed 64 lines of questions * Module 06 (Optimizers): Removed questions section * Module 07 (Training): Removed questions section * Module 08 (DataLoader): Removed 35 lines of questions * Module 09 (Spatial): Removed 34 lines of questions Impact: - Modules now flow directly from tests to summary - Cleaner, more focused module structure - Removes assessment burden from implementation modules - Keeps focus on building and understanding code
2026-03-11 20:55:19 -05:00 · 2025-09-30 06:44:36 -04:00
parent 682801f7bc
commit a691e14b37
34 changed files with 2 additions and 6246 deletions
--- a/modules/01_tensor/tensor_dev.py
+++ b/modules/01_tensor/tensor_dev.py
@@ -1657,119 +1657,6 @@ if __name__ == "__main__":

    print("✅ Module validation complete!")

-# %% [markdown]
-"""
-## 🤔 ML Systems Thinking: Tensor Foundations
-
-Now that you've built a complete tensor system, let's reflect on the systems implications of your implementation.
-"""
-
-# %% nbgrader={"grade": false, "grade_id": "systems-q1", "solution": true}
-# %% [markdown]
-"""
-### Question 1: Memory Scaling Analysis
-You implemented matrix multiplication that creates new tensors for results.
-
-**a) Memory Behavior**: When you compute `A.matmul(B)` where A is (1000×1000) and B is (1000×1000):
- Before operation: 2,000,000 elements (A: 1M + B: 1M = 2M total)
- During operation: _____ elements total in memory (A + B + result = ?)
- After operation: _____ elements (if A and B still exist + result)
-
-**Memory Calculation Help:**
-```
-Matrix Memory: 1000 × 1000 = 1,000,000 elements
-Float32: 4 bytes per element
-Total per matrix: 1M × 4 = 4 MB
-```
-
-**b) Broadcasting Impact**: Your `+` operator uses NumPy broadcasting. When adding a (1000×1000) matrix to a (1000,) vector:
- Does NumPy create a temporary (1000×1000) copy of the vector?
- Or does it compute element-wise without full expansion?
-
-*Think about: temporary arrays, memory copies, and when broadcasting is efficient vs. expensive*
-"""
-
-# %% nbgrader={"grade": false, "grade_id": "systems-q2", "solution": true}
-# %% [markdown]
-"""
-### Question 2: Shape Validation Trade-offs
-Your `matmul` method includes shape validation that raises clear error messages.
-
-**a) Performance Impact**: In a training loop that runs matmul operations millions of times, what's the trade-off of this validation?
- **Pro**: Clear errors help debugging
- **Con**: Extra computation on every call
-
-**b) Optimization Strategy**: How could you optimize this?
-```python
-# Current approach:
-if self.shape[-1] != other.shape[-2]:
-    raise ValueError(...)  # Check every time
-
-# Alternative approaches:
-1. Skip validation in "fast mode"
-2. Validate only during debugging
-3. Let NumPy raise its own error
-```
-
-Which approach would you choose and why?
-
-*Hint: Consider debug mode vs. production mode, and the cost of shape checking vs. cryptic errors*
-"""
-
-# %% nbgrader={"grade": false, "grade_id": "systems-q3", "solution": true}
-# %% [markdown]
-"""
-### Question 3: Dormant Features Design
-You included `requires_grad` and `grad` attributes from the start, even though they're unused until Module 05.
-
-**a) Memory Overhead**: Every tensor now carries these extra attributes:
-```python
-# Each tensor stores:
-self.data = np.array(...)        # The actual data
-self.requires_grad = False       # 1 boolean (8 bytes on 64-bit)
-self.grad = None                 # 1 pointer (8 bytes)
-
-# For 1 million small tensors: extra 16MB overhead
-```
-
-Is this significant? Compare to the data size for typical tensors.
-
-**b) Alternative Approaches**: What are the pros and cons of this approach vs. adding gradient features later through:
- **Inheritance**: `class GradTensor(Tensor)`
- **Composition**: `tensor.grad_info = GradInfo()`
- **Monkey-patching**: `Tensor.grad = property(...)`
-
-*Consider: code complexity, debugging ease, performance, and maintainability*
-"""
-
-# %% nbgrader={"grade": false, "grade_id": "systems-q4", "solution": true}
-# %% [markdown]
-"""
-### Question 4: Broadcasting vs. Explicit Operations
-Your implementation relies heavily on NumPy's automatic broadcasting.
-
-**a) Hidden Complexity**: A student's code works with batch_size=32 but fails with batch_size=1. The error is:
-```
-ValueError: operands could not be broadcast together with shapes (1,128) (128,)
-```
-
-Given that your implementation handles broadcasting automatically, what's likely happening? Think about when broadcasting rules change behavior.
-
-**b) Debugging Challenge**: How would you modify your tensor operations to help students debug broadcasting-related issues?
-
-```python
-# Possible enhancement:
-def __add__(self, other):
-    # Add shape debugging information
-    try:
-        result = self.data + other.data
-    except ValueError as e:
-        # Provide helpful broadcasting explanation
-        raise ValueError(f"Broadcasting failed: {self.shape} + {other.shape}. {helpful_message}")
-```
-
-*Think about: when broadcasting masks bugs, dimension edge cases, and helpful error messages*
-"""

 # %% [markdown]
 """