Remove ML Systems Thinking sections from all modules

Cleaned up module structure by removing reflection questions:
- Updated module-developer.md to remove ML Systems Thinking from template
- Removed ML Systems Thinking sections from all 9 modules:
  * Module 01 (Tensor): Removed 113 lines of questions
  * Module 02 (Activations): Removed 24 lines of questions
  * Module 03 (Layers): Removed 84 lines of questions
  * Module 04 (Losses): Removed 93 lines of questions
  * Module 05 (Autograd): Removed 64 lines of questions
  * Module 06 (Optimizers): Removed questions section
  * Module 07 (Training): Removed questions section
  * Module 08 (DataLoader): Removed 35 lines of questions
  * Module 09 (Spatial): Removed 34 lines of questions

Impact:
- Modules now flow directly from tests to summary
- Cleaner, more focused module structure
- Removes assessment burden from implementation modules
- Keeps focus on building and understanding code
This commit is contained in:
Vijay Janapa Reddi
2025-09-30 06:44:36 -04:00
parent 682801f7bc
commit a691e14b37
34 changed files with 2 additions and 6246 deletions

View File

@@ -1657,119 +1657,6 @@ if __name__ == "__main__":
print("✅ Module validation complete!")
# %% [markdown]
"""
## 🤔 ML Systems Thinking: Tensor Foundations
Now that you've built a complete tensor system, let's reflect on the systems implications of your implementation.
"""
# %% nbgrader={"grade": false, "grade_id": "systems-q1", "solution": true}
# %% [markdown]
"""
### Question 1: Memory Scaling Analysis
You implemented matrix multiplication that creates new tensors for results.
**a) Memory Behavior**: When you compute `A.matmul(B)` where A is (1000×1000) and B is (1000×1000):
- Before operation: 2,000,000 elements (A: 1M + B: 1M = 2M total)
- During operation: _____ elements total in memory (A + B + result = ?)
- After operation: _____ elements (if A and B still exist + result)
**Memory Calculation Help:**
```
Matrix Memory: 1000 × 1000 = 1,000,000 elements
Float32: 4 bytes per element
Total per matrix: 1M × 4 = 4 MB
```
**b) Broadcasting Impact**: Your `+` operator uses NumPy broadcasting. When adding a (1000×1000) matrix to a (1000,) vector:
- Does NumPy create a temporary (1000×1000) copy of the vector?
- Or does it compute element-wise without full expansion?
*Think about: temporary arrays, memory copies, and when broadcasting is efficient vs. expensive*
"""
# %% nbgrader={"grade": false, "grade_id": "systems-q2", "solution": true}
# %% [markdown]
"""
### Question 2: Shape Validation Trade-offs
Your `matmul` method includes shape validation that raises clear error messages.
**a) Performance Impact**: In a training loop that runs matmul operations millions of times, what's the trade-off of this validation?
- **Pro**: Clear errors help debugging
- **Con**: Extra computation on every call
**b) Optimization Strategy**: How could you optimize this?
```python
# Current approach:
if self.shape[-1] != other.shape[-2]:
raise ValueError(...) # Check every time
# Alternative approaches:
1. Skip validation in "fast mode"
2. Validate only during debugging
3. Let NumPy raise its own error
```
Which approach would you choose and why?
*Hint: Consider debug mode vs. production mode, and the cost of shape checking vs. cryptic errors*
"""
# %% nbgrader={"grade": false, "grade_id": "systems-q3", "solution": true}
# %% [markdown]
"""
### Question 3: Dormant Features Design
You included `requires_grad` and `grad` attributes from the start, even though they're unused until Module 05.
**a) Memory Overhead**: Every tensor now carries these extra attributes:
```python
# Each tensor stores:
self.data = np.array(...) # The actual data
self.requires_grad = False # 1 boolean (8 bytes on 64-bit)
self.grad = None # 1 pointer (8 bytes)
# For 1 million small tensors: extra 16MB overhead
```
Is this significant? Compare to the data size for typical tensors.
**b) Alternative Approaches**: What are the pros and cons of this approach vs. adding gradient features later through:
- **Inheritance**: `class GradTensor(Tensor)`
- **Composition**: `tensor.grad_info = GradInfo()`
- **Monkey-patching**: `Tensor.grad = property(...)`
*Consider: code complexity, debugging ease, performance, and maintainability*
"""
# %% nbgrader={"grade": false, "grade_id": "systems-q4", "solution": true}
# %% [markdown]
"""
### Question 4: Broadcasting vs. Explicit Operations
Your implementation relies heavily on NumPy's automatic broadcasting.
**a) Hidden Complexity**: A student's code works with batch_size=32 but fails with batch_size=1. The error is:
```
ValueError: operands could not be broadcast together with shapes (1,128) (128,)
```
Given that your implementation handles broadcasting automatically, what's likely happening? Think about when broadcasting rules change behavior.
**b) Debugging Challenge**: How would you modify your tensor operations to help students debug broadcasting-related issues?
```python
# Possible enhancement:
def __add__(self, other):
# Add shape debugging information
try:
result = self.data + other.data
except ValueError as e:
# Provide helpful broadcasting explanation
raise ValueError(f"Broadcasting failed: {self.shape} + {other.shape}. {helpful_message}")
```
*Think about: when broadcasting masks bugs, dimension edge cases, and helpful error messages*
"""
# %% [markdown]
"""