mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-11 20:55:19 -05:00
Remove ML Systems Thinking sections from all modules
Cleaned up module structure by removing reflection questions: - Updated module-developer.md to remove ML Systems Thinking from template - Removed ML Systems Thinking sections from all 9 modules: * Module 01 (Tensor): Removed 113 lines of questions * Module 02 (Activations): Removed 24 lines of questions * Module 03 (Layers): Removed 84 lines of questions * Module 04 (Losses): Removed 93 lines of questions * Module 05 (Autograd): Removed 64 lines of questions * Module 06 (Optimizers): Removed questions section * Module 07 (Training): Removed questions section * Module 08 (DataLoader): Removed 35 lines of questions * Module 09 (Spatial): Removed 34 lines of questions Impact: - Modules now flow directly from tests to summary - Cleaner, more focused module structure - Removes assessment burden from implementation modules - Keeps focus on building and understanding code
This commit is contained in:
@@ -1657,119 +1657,6 @@ if __name__ == "__main__":
|
||||
|
||||
print("✅ Module validation complete!")
|
||||
|
||||
# %% [markdown]
|
||||
"""
|
||||
## 🤔 ML Systems Thinking: Tensor Foundations
|
||||
|
||||
Now that you've built a complete tensor system, let's reflect on the systems implications of your implementation.
|
||||
"""
|
||||
|
||||
# %% nbgrader={"grade": false, "grade_id": "systems-q1", "solution": true}
|
||||
# %% [markdown]
|
||||
"""
|
||||
### Question 1: Memory Scaling Analysis
|
||||
You implemented matrix multiplication that creates new tensors for results.
|
||||
|
||||
**a) Memory Behavior**: When you compute `A.matmul(B)` where A is (1000×1000) and B is (1000×1000):
|
||||
- Before operation: 2,000,000 elements (A: 1M + B: 1M = 2M total)
|
||||
- During operation: _____ elements total in memory (A + B + result = ?)
|
||||
- After operation: _____ elements (if A and B still exist + result)
|
||||
|
||||
**Memory Calculation Help:**
|
||||
```
|
||||
Matrix Memory: 1000 × 1000 = 1,000,000 elements
|
||||
Float32: 4 bytes per element
|
||||
Total per matrix: 1M × 4 = 4 MB
|
||||
```
|
||||
|
||||
**b) Broadcasting Impact**: Your `+` operator uses NumPy broadcasting. When adding a (1000×1000) matrix to a (1000,) vector:
|
||||
- Does NumPy create a temporary (1000×1000) copy of the vector?
|
||||
- Or does it compute element-wise without full expansion?
|
||||
|
||||
*Think about: temporary arrays, memory copies, and when broadcasting is efficient vs. expensive*
|
||||
"""
|
||||
|
||||
# %% nbgrader={"grade": false, "grade_id": "systems-q2", "solution": true}
|
||||
# %% [markdown]
|
||||
"""
|
||||
### Question 2: Shape Validation Trade-offs
|
||||
Your `matmul` method includes shape validation that raises clear error messages.
|
||||
|
||||
**a) Performance Impact**: In a training loop that runs matmul operations millions of times, what's the trade-off of this validation?
|
||||
- **Pro**: Clear errors help debugging
|
||||
- **Con**: Extra computation on every call
|
||||
|
||||
**b) Optimization Strategy**: How could you optimize this?
|
||||
```python
|
||||
# Current approach:
|
||||
if self.shape[-1] != other.shape[-2]:
|
||||
raise ValueError(...) # Check every time
|
||||
|
||||
# Alternative approaches:
|
||||
1. Skip validation in "fast mode"
|
||||
2. Validate only during debugging
|
||||
3. Let NumPy raise its own error
|
||||
```
|
||||
|
||||
Which approach would you choose and why?
|
||||
|
||||
*Hint: Consider debug mode vs. production mode, and the cost of shape checking vs. cryptic errors*
|
||||
"""
|
||||
|
||||
# %% nbgrader={"grade": false, "grade_id": "systems-q3", "solution": true}
|
||||
# %% [markdown]
|
||||
"""
|
||||
### Question 3: Dormant Features Design
|
||||
You included `requires_grad` and `grad` attributes from the start, even though they're unused until Module 05.
|
||||
|
||||
**a) Memory Overhead**: Every tensor now carries these extra attributes:
|
||||
```python
|
||||
# Each tensor stores:
|
||||
self.data = np.array(...) # The actual data
|
||||
self.requires_grad = False # 1 boolean (8 bytes on 64-bit)
|
||||
self.grad = None # 1 pointer (8 bytes)
|
||||
|
||||
# For 1 million small tensors: extra 16MB overhead
|
||||
```
|
||||
|
||||
Is this significant? Compare to the data size for typical tensors.
|
||||
|
||||
**b) Alternative Approaches**: What are the pros and cons of this approach vs. adding gradient features later through:
|
||||
- **Inheritance**: `class GradTensor(Tensor)`
|
||||
- **Composition**: `tensor.grad_info = GradInfo()`
|
||||
- **Monkey-patching**: `Tensor.grad = property(...)`
|
||||
|
||||
*Consider: code complexity, debugging ease, performance, and maintainability*
|
||||
"""
|
||||
|
||||
# %% nbgrader={"grade": false, "grade_id": "systems-q4", "solution": true}
|
||||
# %% [markdown]
|
||||
"""
|
||||
### Question 4: Broadcasting vs. Explicit Operations
|
||||
Your implementation relies heavily on NumPy's automatic broadcasting.
|
||||
|
||||
**a) Hidden Complexity**: A student's code works with batch_size=32 but fails with batch_size=1. The error is:
|
||||
```
|
||||
ValueError: operands could not be broadcast together with shapes (1,128) (128,)
|
||||
```
|
||||
|
||||
Given that your implementation handles broadcasting automatically, what's likely happening? Think about when broadcasting rules change behavior.
|
||||
|
||||
**b) Debugging Challenge**: How would you modify your tensor operations to help students debug broadcasting-related issues?
|
||||
|
||||
```python
|
||||
# Possible enhancement:
|
||||
def __add__(self, other):
|
||||
# Add shape debugging information
|
||||
try:
|
||||
result = self.data + other.data
|
||||
except ValueError as e:
|
||||
# Provide helpful broadcasting explanation
|
||||
raise ValueError(f"Broadcasting failed: {self.shape} + {other.shape}. {helpful_message}")
|
||||
```
|
||||
|
||||
*Think about: when broadcasting masks bugs, dimension edge cases, and helpful error messages*
|
||||
"""
|
||||
|
||||
# %% [markdown]
|
||||
"""
|
||||
|
||||
Reference in New Issue
Block a user