mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-02 01:27:32 -05:00
This commit adds complete documentation for the 5-milestone system that transforms TinyTorch from module-based to capability-driven learning: 📚 Documentation Suite: - milestone-system.md: Student-facing guide with milestone descriptions - instructor-milestone-guide.md: Complete assessment framework for instructors - milestone-troubleshooting.md: Comprehensive debugging guide for common issues - milestone-implementation-guide.md: Technical implementation specifications - milestone-system-overview.md: Executive summary tying everything together 🎯 The Five Milestones: 1. Basic Inference (Module 04) - Neural networks work (85%+ MNIST) 2. Computer Vision (Module 06) - MNIST recognition (95%+ CNN accuracy) 3. Full Training (Module 11) - Complete training loops (CIFAR-10 training) 4. Advanced Vision (Module 13) - CIFAR-10 classification (75%+ accuracy) 5. Language Generation (Module 16) - GPT text generation (coherent output) 🚀 Key Features: - Capability-based achievement system replacing traditional module completion - Visual progress tracking with Rich CLI visualizations - Victory conditions aligned with industry-relevant skills - Comprehensive troubleshooting for each milestone challenge - Instructor assessment framework with automated testing - Technical implementation roadmap for CLI integration 💡 Educational Impact: - Students develop portfolio-worthy capabilities rather than just completing assignments - Clear progression from basic neural networks to production AI systems - Motivation through achievement and concrete skill development - Industry alignment with real ML engineering competencies Ready for implementation phase with complete technical specifications.
670 lines
19 KiB
Markdown
670 lines
19 KiB
Markdown
# 🔧 TinyTorch Milestone Troubleshooting Guide
|
|
|
|
## Common Issues and Solutions
|
|
|
|
This guide helps you overcome the most frequent challenges students encounter while pursuing TinyTorch milestones. Each section provides symptoms, diagnoses, and concrete solutions.
|
|
|
|
---
|
|
|
|
## 🎯 Milestone 1: Basic Inference
|
|
|
|
### Issue: "My neural network outputs don't make sense"
|
|
|
|
**Symptoms:**
|
|
- Network outputs NaN or inf values
|
|
- All predictions are the same number
|
|
- Accuracy stuck at random chance (10% for MNIST)
|
|
- Gradients exploding or vanishing
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Weight Initialization Problems
|
|
```python
|
|
# ❌ WRONG: Weights too large
|
|
self.weight = Tensor(np.random.randn(input_size, output_size))
|
|
|
|
# ✅ CORRECT: Xavier initialization
|
|
scale = np.sqrt(2.0 / (input_size + output_size))
|
|
self.weight = Tensor(np.random.randn(input_size, output_size) * scale)
|
|
```
|
|
|
|
#### Shape Mismatch Issues
|
|
```python
|
|
# Debug shapes at each step
|
|
print(f"Input shape: {x.shape}")
|
|
output = self.dense1(x)
|
|
print(f"After dense1: {output.shape}")
|
|
output = self.activation(output)
|
|
print(f"After activation: {output.shape}")
|
|
```
|
|
|
|
#### Learning Rate Problems
|
|
```python
|
|
# ❌ TOO HIGH: Learning rate 1.0 causes instability
|
|
optimizer = SGD(model.parameters(), learning_rate=1.0)
|
|
|
|
# ✅ GOOD: Start with smaller learning rate
|
|
optimizer = SGD(model.parameters(), learning_rate=0.01)
|
|
```
|
|
|
|
### Issue: "MNIST accuracy stuck below 85%"
|
|
|
|
**Symptoms:**
|
|
- Network trains but plateaus at 60-70% accuracy
|
|
- Loss decreases but accuracy doesn't improve
|
|
- Similar performance on training and test sets
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Insufficient Network Capacity
|
|
```python
|
|
# ❌ TOO SIMPLE: Not enough parameters
|
|
model = Sequential([
|
|
Dense(784, 10), # Only 7,850 parameters
|
|
Softmax()
|
|
])
|
|
|
|
# ✅ BETTER: More capacity for complex patterns
|
|
model = Sequential([
|
|
Dense(784, 128), ReLU(), # Hidden layer for feature learning
|
|
Dense(128, 64), ReLU(), # Additional feature refinement
|
|
Dense(64, 10), Softmax() # Final classification
|
|
])
|
|
```
|
|
|
|
#### Activation Function Issues
|
|
```python
|
|
# ❌ WRONG: No activation between layers
|
|
model = Sequential([
|
|
Dense(784, 128),
|
|
Dense(128, 10), # Linear combinations of linear functions = linear
|
|
Softmax()
|
|
])
|
|
|
|
# ✅ CORRECT: Nonlinearity enables complex patterns
|
|
model = Sequential([
|
|
Dense(784, 128), ReLU(), # Nonlinearity crucial!
|
|
Dense(128, 10), Softmax()
|
|
])
|
|
```
|
|
|
|
---
|
|
|
|
## 👁️ Milestone 2: Computer Vision
|
|
|
|
### Issue: "Convolution implementation is too slow"
|
|
|
|
**Symptoms:**
|
|
- Conv2D forward pass takes >10 seconds for small images
|
|
- Memory usage explodes during convolution
|
|
- System becomes unresponsive during training
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Inefficient Convolution Loops
|
|
```python
|
|
# ❌ SLOW: Nested Python loops
|
|
for batch in range(batch_size):
|
|
for out_ch in range(out_channels):
|
|
for in_ch in range(in_channels):
|
|
for h in range(output_height):
|
|
for w in range(output_width):
|
|
# Convolution computation
|
|
result[batch, out_ch, h, w] += ...
|
|
|
|
# ✅ FASTER: Vectorized operations using im2col
|
|
def im2col_convolution(input_tensor, weight, bias=None):
|
|
# Convert convolution to matrix multiplication
|
|
input_cols = im2col(input_tensor, weight.shape[2:])
|
|
output = input_cols @ weight.reshape(weight.shape[0], -1).T
|
|
return output.reshape(batch_size, out_channels, output_height, output_width)
|
|
```
|
|
|
|
#### Memory Inefficiency
|
|
```python
|
|
# ❌ MEMORY HOG: Creating intermediate tensors in loops
|
|
for i in range(kernel_height):
|
|
for j in range(kernel_width):
|
|
temp_tensor = input[:, :, i:i+output_height, j:j+output_width]
|
|
result += temp_tensor * kernel[:, :, i, j]
|
|
|
|
# ✅ MEMORY EFFICIENT: In-place operations
|
|
output = Tensor(np.zeros((batch_size, out_channels, output_height, output_width)))
|
|
for i in range(kernel_height):
|
|
for j in range(kernel_width):
|
|
# Use views instead of copies
|
|
input_slice = input[:, :, i:i+output_height, j:j+output_width]
|
|
output += input_slice * kernel[:, :, i, j]
|
|
```
|
|
|
|
### Issue: "CNN accuracy worse than dense network"
|
|
|
|
**Symptoms:**
|
|
- Dense network achieves 90%+ on MNIST
|
|
- CNN with same parameters gets 70-80%
|
|
- CNN training loss decreases slower than dense
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Poor CNN Architecture
|
|
```python
|
|
# ❌ BAD: CNN worse than dense
|
|
model = Sequential([
|
|
Conv2D(1, 32, kernel_size=7), # Too large kernel
|
|
ReLU(),
|
|
Flatten(),
|
|
Dense(32 * 22 * 22, 10) # Huge dense layer
|
|
])
|
|
|
|
# ✅ GOOD: Proper CNN design
|
|
model = Sequential([
|
|
Conv2D(1, 16, kernel_size=3), ReLU(), # Small kernels
|
|
MaxPool2D(kernel_size=2), # Reduce spatial size
|
|
Conv2D(16, 32, kernel_size=3), ReLU(),
|
|
MaxPool2D(kernel_size=2),
|
|
Flatten(),
|
|
Dense(32 * 5 * 5, 128), ReLU(), # Reasonable dense size
|
|
Dense(128, 10)
|
|
])
|
|
```
|
|
|
|
#### Padding and Stride Issues
|
|
```python
|
|
# ❌ WRONG: Losing too much spatial information
|
|
conv = Conv2D(1, 16, kernel_size=5, stride=2, padding=0) # Aggressive downsampling
|
|
|
|
# ✅ CORRECT: Preserve spatial information
|
|
conv = Conv2D(1, 16, kernel_size=3, stride=1, padding=1) # Same size output
|
|
pool = MaxPool2D(kernel_size=2) # Controlled downsampling
|
|
```
|
|
|
|
---
|
|
|
|
## ⚙️ Milestone 3: Full Training
|
|
|
|
### Issue: "Training loss not decreasing"
|
|
|
|
**Symptoms:**
|
|
- Loss remains constant across epochs
|
|
- Gradients are all zeros or very small
|
|
- Model predictions don't change during training
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Learning Rate Too Small
|
|
```python
|
|
# ❌ TOO SMALL: No visible progress
|
|
optimizer = Adam(model.parameters(), learning_rate=1e-6)
|
|
|
|
# ✅ GOOD RANGE: Start here and adjust
|
|
optimizer = Adam(model.parameters(), learning_rate=1e-3)
|
|
|
|
# Monitor gradient norms to debug
|
|
def check_gradients(model):
|
|
total_norm = 0.0
|
|
for param in model.parameters():
|
|
if param.grad is not None:
|
|
total_norm += param.grad.data.norm()**2
|
|
return total_norm**0.5
|
|
|
|
print(f"Gradient norm: {check_gradients(model)}")
|
|
```
|
|
|
|
#### Incorrect Loss Function Implementation
|
|
```python
|
|
# ❌ WRONG: CrossEntropy without log-softmax
|
|
def cross_entropy_loss(predictions, targets):
|
|
return -np.mean(predictions[range(len(targets)), targets])
|
|
|
|
# ✅ CORRECT: Proper log-softmax + NLL
|
|
def cross_entropy_loss(logits, targets):
|
|
log_probs = log_softmax(logits)
|
|
return -np.mean(log_probs[range(len(targets)), targets])
|
|
|
|
def log_softmax(x):
|
|
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
|
|
return np.log(exp_x / np.sum(exp_x, axis=1, keepdims=True))
|
|
```
|
|
|
|
### Issue: "CIFAR-10 training diverges or gets stuck"
|
|
|
|
**Symptoms:**
|
|
- Loss starts decreasing then shoots up to infinity
|
|
- Accuracy drops during training
|
|
- NaN values appear in loss or gradients
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Data Preprocessing Issues
|
|
```python
|
|
# ❌ WRONG: Using raw pixel values 0-255
|
|
train_data = cifar10_data # Values in [0, 255]
|
|
|
|
# ✅ CORRECT: Normalize to reasonable range
|
|
train_data = cifar10_data.astype(np.float32) / 255.0 # Values in [0, 1]
|
|
|
|
# Even better: Zero-center and normalize
|
|
mean = np.array([0.485, 0.456, 0.406])
|
|
std = np.array([0.229, 0.224, 0.225])
|
|
train_data = (train_data - mean) / std
|
|
```
|
|
|
|
#### Batch Size Too Large
|
|
```python
|
|
# ❌ PROBLEMATIC: Batch size too large for dataset
|
|
train_loader = DataLoader(train_dataset, batch_size=1024, shuffle=True)
|
|
|
|
# ✅ BETTER: Moderate batch size for stability
|
|
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
|
|
```
|
|
|
|
#### Learning Rate Scheduling
|
|
```python
|
|
# ❌ BASIC: Fixed learning rate throughout training
|
|
optimizer = Adam(model.parameters(), learning_rate=0.001)
|
|
|
|
# ✅ ADVANCED: Learning rate decay for convergence
|
|
def adjust_learning_rate(optimizer, epoch, initial_lr=0.001):
|
|
lr = initial_lr * (0.9 ** (epoch // 10))
|
|
for param_group in optimizer.param_groups:
|
|
param_group['lr'] = lr
|
|
return lr
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 Milestone 4: Advanced Vision
|
|
|
|
### Issue: "Can't reach 75% CIFAR-10 accuracy"
|
|
|
|
**Symptoms:**
|
|
- Model plateaus at 65-70% accuracy
|
|
- Training and validation accuracy gap is large
|
|
- Loss continues decreasing but accuracy doesn't improve
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Insufficient Model Complexity
|
|
```python
|
|
# ❌ TOO SIMPLE: Not enough capacity for CIFAR-10
|
|
model = Sequential([
|
|
Conv2D(3, 16, 3), ReLU(),
|
|
MaxPool2D(2),
|
|
Flatten(),
|
|
Dense(16 * 16 * 16, 10)
|
|
])
|
|
|
|
# ✅ BETTER: Deeper architecture with more features
|
|
model = Sequential([
|
|
Conv2D(3, 32, 3), ReLU(),
|
|
Conv2D(32, 32, 3), ReLU(),
|
|
MaxPool2D(2),
|
|
Conv2D(32, 64, 3), ReLU(),
|
|
Conv2D(64, 64, 3), ReLU(),
|
|
MaxPool2D(2),
|
|
Flatten(),
|
|
Dense(64 * 6 * 6, 256), ReLU(),
|
|
Dropout(0.5),
|
|
Dense(256, 10)
|
|
])
|
|
```
|
|
|
|
#### Overfitting Problems
|
|
```python
|
|
# Add regularization techniques
|
|
model = Sequential([
|
|
Conv2D(3, 32, 3), BatchNorm2D(32), ReLU(),
|
|
Conv2D(32, 32, 3), BatchNorm2D(32), ReLU(),
|
|
MaxPool2D(2), Dropout(0.2),
|
|
|
|
Conv2D(32, 64, 3), BatchNorm2D(64), ReLU(),
|
|
Conv2D(64, 64, 3), BatchNorm2D(64), ReLU(),
|
|
MaxPool2D(2), Dropout(0.3),
|
|
|
|
Flatten(),
|
|
Dense(64 * 6 * 6, 256), BatchNorm1D(256), ReLU(),
|
|
Dropout(0.5),
|
|
Dense(256, 10)
|
|
])
|
|
```
|
|
|
|
#### Data Augmentation Missing
|
|
```python
|
|
# ✅ ADD: Data augmentation for better generalization
|
|
def augment_cifar10(image):
|
|
# Random horizontal flip
|
|
if np.random.random() > 0.5:
|
|
image = np.fliplr(image)
|
|
|
|
# Random crop and pad
|
|
pad_width = 4
|
|
padded = np.pad(image, ((pad_width, pad_width), (pad_width, pad_width), (0, 0)), mode='constant')
|
|
crop_x = np.random.randint(0, 2 * pad_width + 1)
|
|
crop_y = np.random.randint(0, 2 * pad_width + 1)
|
|
image = padded[crop_y:crop_y+32, crop_x:crop_x+32]
|
|
|
|
return image
|
|
|
|
class AugmentedCIFAR10Dataset(CIFAR10Dataset):
|
|
def __getitem__(self, idx):
|
|
image, label = super().__getitem__(idx)
|
|
if self.train:
|
|
image = augment_cifar10(image)
|
|
return image, label
|
|
```
|
|
|
|
### Issue: "Model training takes too long"
|
|
|
|
**Symptoms:**
|
|
- Single epoch takes >10 minutes
|
|
- GPU utilization low or no GPU being used
|
|
- Memory usage constantly growing
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Inefficient Convolution Implementation
|
|
```python
|
|
# Profile your convolution
|
|
import time
|
|
|
|
def time_convolution():
|
|
input_tensor = Tensor(np.random.randn(32, 3, 32, 32))
|
|
conv = Conv2D(3, 64, kernel_size=3)
|
|
|
|
start_time = time.time()
|
|
for _ in range(100):
|
|
output = conv(input_tensor)
|
|
end_time = time.time()
|
|
|
|
print(f"100 convolutions took {end_time - start_time:.2f} seconds")
|
|
print(f"Average time per convolution: {(end_time - start_time)/100:.4f} seconds")
|
|
|
|
time_convolution()
|
|
```
|
|
|
|
#### Memory Leaks in Training Loop
|
|
```python
|
|
# ❌ MEMORY LEAK: Accumulating computation graphs
|
|
for epoch in range(epochs):
|
|
for batch_idx, (data, target) in enumerate(train_loader):
|
|
output = model(data)
|
|
loss = loss_fn(output, target)
|
|
loss.backward()
|
|
optimizer.step()
|
|
# Missing: optimizer.zero_grad()
|
|
|
|
# ✅ CORRECT: Clear gradients each iteration
|
|
for epoch in range(epochs):
|
|
for batch_idx, (data, target) in enumerate(train_loader):
|
|
optimizer.zero_grad() # Clear previous gradients
|
|
output = model(data)
|
|
loss = loss_fn(output, target)
|
|
loss.backward()
|
|
optimizer.step()
|
|
```
|
|
|
|
---
|
|
|
|
## 🔥 Milestone 5: Language Generation
|
|
|
|
### Issue: "GPT generates nonsense text"
|
|
|
|
**Symptoms:**
|
|
- Generated text is random characters
|
|
- Model outputs same character repeatedly
|
|
- Text has no recognizable patterns or structure
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Tokenization Problems
|
|
```python
|
|
# ❌ WRONG: Inconsistent character mapping
|
|
def tokenize(text):
|
|
chars = list(set(text)) # Order changes each run!
|
|
char_to_idx = {ch: i for i, ch in enumerate(chars)}
|
|
return [char_to_idx[ch] for ch in text]
|
|
|
|
# ✅ CORRECT: Consistent character vocabulary
|
|
class CharTokenizer:
|
|
def __init__(self, text):
|
|
self.chars = sorted(list(set(text))) # Consistent ordering
|
|
self.char_to_idx = {ch: i for i, ch in enumerate(self.chars)}
|
|
self.idx_to_char = {i: ch for i, ch in enumerate(self.chars)}
|
|
|
|
def encode(self, text):
|
|
return [self.char_to_idx[ch] for ch in text]
|
|
|
|
def decode(self, indices):
|
|
return ''.join([self.idx_to_char[i] for i in indices])
|
|
```
|
|
|
|
#### Sequence Length Issues
|
|
```python
|
|
# ❌ TOO LONG: Sequence length too large for available data
|
|
sequence_length = 1000 # Only have 10,000 chars total
|
|
|
|
# ✅ REASONABLE: Sequence length appropriate for dataset
|
|
sequence_length = min(100, len(text) // 100) # At least 100 sequences
|
|
```
|
|
|
|
#### Position Encoding Missing
|
|
```python
|
|
# ❌ MISSING: No positional information
|
|
class GPTBlock(nn.Module):
|
|
def __init__(self, embed_dim, num_heads):
|
|
self.attention = MultiHeadAttention(embed_dim, num_heads)
|
|
self.mlp = MLP(embed_dim)
|
|
|
|
def forward(self, x):
|
|
x = x + self.attention(x) # No position info!
|
|
x = x + self.mlp(x)
|
|
return x
|
|
|
|
# ✅ CORRECT: Add positional encoding
|
|
class GPTBlock(nn.Module):
|
|
def __init__(self, embed_dim, num_heads, max_seq_len):
|
|
self.attention = MultiHeadAttention(embed_dim, num_heads)
|
|
self.mlp = MLP(embed_dim)
|
|
self.pos_encoding = PositionalEncoding(embed_dim, max_seq_len)
|
|
|
|
def forward(self, x):
|
|
x = x + self.pos_encoding(x) # Add position information
|
|
x = x + self.attention(x)
|
|
x = x + self.mlp(x)
|
|
return x
|
|
```
|
|
|
|
### Issue: "Can't reuse components from vision modules"
|
|
|
|
**Symptoms:**
|
|
- Having to reimplement Dense layers, ReLU, etc.
|
|
- Components don't work with sequence data
|
|
- Different interfaces for vision vs. language components
|
|
|
|
**Diagnosis & Solutions:**
|
|
|
|
#### Shape Incompatibility
|
|
```python
|
|
# ❌ PROBLEM: Dense layer expects 2D input, sequences are 3D
|
|
# Sequence shape: (batch_size, sequence_length, embed_dim)
|
|
# Dense expects: (batch_size, features)
|
|
|
|
# ✅ SOLUTION: Reshape for compatibility
|
|
class SequenceDense(nn.Module):
|
|
def __init__(self, input_dim, output_dim):
|
|
self.dense = Dense(input_dim, output_dim) # Reuse vision component!
|
|
|
|
def forward(self, x):
|
|
# x shape: (batch, seq_len, input_dim)
|
|
batch_size, seq_len, input_dim = x.shape
|
|
|
|
# Reshape to 2D for dense layer
|
|
x_flat = x.reshape(batch_size * seq_len, input_dim)
|
|
|
|
# Apply dense transformation
|
|
output_flat = self.dense(x_flat)
|
|
|
|
# Reshape back to sequence format
|
|
output_dim = output_flat.shape[-1]
|
|
return output_flat.reshape(batch_size, seq_len, output_dim)
|
|
```
|
|
|
|
#### Different Data Types
|
|
```python
|
|
# ❌ ISSUE: Vision uses float32, language uses int64 indices
|
|
# Vision: image_tensor = Tensor(np.float32([...]))
|
|
# Language: token_indices = [1, 5, 12, ...]
|
|
|
|
# ✅ SOLUTION: Embedding layer converts indices to vectors
|
|
class TokenEmbedding(nn.Module):
|
|
def __init__(self, vocab_size, embed_dim):
|
|
self.embedding = Tensor(np.random.randn(vocab_size, embed_dim) * 0.1)
|
|
|
|
def forward(self, token_indices):
|
|
# Convert integer indices to float embeddings
|
|
return self.embedding[token_indices] # Now compatible with Dense layers!
|
|
```
|
|
|
|
---
|
|
|
|
## 🛠️ General Debugging Strategies
|
|
|
|
### Debugging Checklist
|
|
|
|
**Before Every Milestone Attempt:**
|
|
1. [ ] Environment activated: `source .venv/bin/activate`
|
|
2. [ ] Dependencies updated: `pip install -r requirements.txt`
|
|
3. [ ] Previous modules working: `tito test --all-previous`
|
|
4. [ ] Clean workspace: `git status` shows clean state
|
|
|
|
**During Implementation:**
|
|
1. [ ] Print shapes at every step
|
|
2. [ ] Test with small data first (batch_size=1, small input)
|
|
3. [ ] Use debugger breakpoints at critical functions
|
|
4. [ ] Save intermediate results for inspection
|
|
|
|
**Before Milestone Submission:**
|
|
1. [ ] Code runs without errors
|
|
2. [ ] Performance benchmarks met
|
|
3. [ ] All tests pass: `tito milestone test X`
|
|
4. [ ] Code exported successfully: `tito export --module X`
|
|
|
|
### Performance Debugging
|
|
|
|
**Memory Usage:**
|
|
```python
|
|
import tracemalloc
|
|
|
|
def debug_memory_usage():
|
|
tracemalloc.start()
|
|
|
|
# Your code here
|
|
model = build_model()
|
|
train_one_epoch(model)
|
|
|
|
current, peak = tracemalloc.get_traced_memory()
|
|
print(f"Current memory usage: {current / 1024 / 1024:.1f} MB")
|
|
print(f"Peak memory usage: {peak / 1024 / 1024:.1f} MB")
|
|
tracemalloc.stop()
|
|
```
|
|
|
|
**Training Speed:**
|
|
```python
|
|
import time
|
|
|
|
def benchmark_training_speed():
|
|
model = build_model()
|
|
dummy_data = create_dummy_batch()
|
|
|
|
# Warm up
|
|
for _ in range(5):
|
|
_ = model(dummy_data)
|
|
|
|
# Benchmark
|
|
start_time = time.time()
|
|
for _ in range(100):
|
|
output = model(dummy_data)
|
|
end_time = time.time()
|
|
|
|
avg_time = (end_time - start_time) / 100
|
|
print(f"Average forward pass time: {avg_time*1000:.2f} ms")
|
|
```
|
|
|
|
### Getting Help
|
|
|
|
**Documentation Resources:**
|
|
- Module READMEs: `modules/source/XX_module/README.md`
|
|
- API Reference: `book/appendices/api-reference.md`
|
|
- Troubleshooting: This guide!
|
|
|
|
**Community Support:**
|
|
- Discord/Slack: #tinytorch-help channel
|
|
- Office Hours: See course calendar
|
|
- Study Groups: Form with classmates working on same milestone
|
|
|
|
**Instructor Support:**
|
|
- Email for conceptual questions
|
|
- Office hours for debugging sessions
|
|
- Milestone review meetings for stuck students
|
|
|
|
### When to Ask for Help
|
|
|
|
**Ask for help if:**
|
|
- Stuck on same issue for >2 hours
|
|
- Performance far below milestone requirements
|
|
- Unclear about milestone requirements
|
|
- Suspecting bug in provided code
|
|
|
|
**Before asking, prepare:**
|
|
- Minimal code example reproducing the issue
|
|
- Error messages and stack traces
|
|
- What you've already tried
|
|
- Specific question, not just "it doesn't work"
|
|
|
|
---
|
|
|
|
## 🎯 Success Strategies
|
|
|
|
### Milestone Achievement Tips
|
|
|
|
**Start Early:**
|
|
- Begin milestone attempts when you complete prerequisites
|
|
- Don't wait until the deadline to discover issues
|
|
- Use intermediate checkpoints to track progress
|
|
|
|
**Incremental Development:**
|
|
- Get basic version working first
|
|
- Optimize performance second
|
|
- Add advanced features last
|
|
|
|
**Test-Driven Development:**
|
|
- Write tests for your functions before implementation
|
|
- Use provided test suites as specification
|
|
- Add your own tests for edge cases
|
|
|
|
**Systematic Debugging:**
|
|
- Isolate issues to smallest possible code section
|
|
- Use print statements and debugger strategically
|
|
- Keep a debugging log of what you've tried
|
|
|
|
### Building Confidence
|
|
|
|
**Celebrate Small Wins:**
|
|
- First successful forward pass
|
|
- First decreasing loss curve
|
|
- First accuracy improvement
|
|
|
|
**Learn from Failures:**
|
|
- Every bug teaches you something about the system
|
|
- Failed milestones often lead to deeper understanding
|
|
- Debugging skills are as valuable as implementation skills
|
|
|
|
**Connect to Bigger Picture:**
|
|
- Each milestone represents real-world capability
|
|
- Your implementations mirror industry practices
|
|
- Skills transfer directly to research and industry roles
|
|
|
|
**Remember the Goal:**
|
|
You're not just completing assignments—you're building genuine ML systems engineering expertise that will serve you throughout your career. Every challenge overcome makes you a stronger engineer.
|
|
|
|
🚀 **Keep going! Every milestone brings you closer to ML systems mastery.** |