This commit implements comprehensive PyTorch compatibility improvements:
**Core Changes:**
- Add __call__ methods to all neural network components in modules 11-18
- Enable PyTorch-standard calling syntax: model(input) vs model.forward(input)
- Maintain backward compatibility - forward() methods still work
**Modules Updated:**
- Module 11 (Embeddings): Embedding, PositionalEncoding, EmbeddingLayer
- Module 12 (Attention): MultiHeadAttention
- Module 13 (Transformers): LayerNorm, MLP, TransformerBlock, GPT
- Module 17 (Quantization): QuantizedLinear
- Module 18 (Compression): Linear, Sequential classes
**Milestone Updates:**
- Replace all .forward() calls with direct () calls in milestone examples
- Update transformer milestones (vaswani_shakespeare, tinystories_gpt, tinytalks_gpt)
- Update CNN and MLP milestone examples
- Update MILESTONE_TEMPLATE.py for consistency
**Educational Benefits:**
- Students now write identical syntax to production PyTorch code
- Seamless transition from TinyTorch to PyTorch development
- Industry-standard calling conventions from day one
**Implementation Pattern:**
```python
def __call__(self, *args, **kwargs):
"""Allows the component to be called like a function."""
return self.forward(*args, **kwargs)
```
All changes maintain full backward compatibility while enabling PyTorch-style usage.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Critical fixes for transformer gradient flow:
EmbeddingBackward:
- Implements scatter-add gradient accumulation for embedding lookups
- Added to Module 05 (autograd_dev.py)
- Module 11 imports and uses it in Embedding.forward()
- Gradients now flow back to embedding weights
ReshapeBackward:
- reshape() was breaking computation graph (no _grad_fn)
- Added backward function that reshapes gradient back to original shape
- Patched Tensor.reshape() in enable_autograd()
- Critical for GPT forward pass (logits.reshape before loss)
Results:
- Before: 0/37 parameters receive gradients, loss stuck
- After: 13/37 parameters receive gradients (35%)
- Single batch overfitting: 4.46 → 0.03 (99.4% improvement!)
- MODEL NOW LEARNS! 🎉
Remaining work: 24 parameters still missing gradients (likely attention)
Tests added:
- tests/milestones/test_05_transformer_architecture.py (Phase 1)
- Multiple debug scripts to isolate issues
- Embedding.forward() now preserves requires_grad from weight tensor
- PositionalEncoding.forward() uses Tensor addition (x + pos) instead of .data
- Critical for transformer input embeddings to have gradients
Both changes ensure gradient flows from loss back to embedding weights
- Add CLAUDE.md entry point for Claude AI system
- Fix tito test command to set PYTHONPATH for module imports
- Fix embeddings export directive placement for nbdev
- Fix attention module to export imports properly
- Fix transformers embedding index casting to int
PROBLEM:
- nbdev requires #| export directive on EACH cell to export when using # %% markers
- Cell markers inside class definitions split classes across multiple cells
- Only partial classes were being exported to tinytorch package
- Missing matmul, arithmetic operations, and activation classes in exports
SOLUTION:
1. Removed # %% cell markers INSIDE class definitions (kept classes as single units)
2. Added #| export to imports cell at top of each module
3. Added #| export before each exportable class definition in all 20 modules
4. Added __call__ method to Sigmoid for functional usage
5. Fixed numpy import (moved to module level from __init__)
MODULES FIXED:
- 01_tensor: Tensor class with all operations (matmul, arithmetic, shape ops)
- 02_activations: Sigmoid, ReLU, Tanh, GELU, Softmax classes
- 03_layers: Linear, Dropout classes
- 04_losses: MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss classes
- 05_autograd: Function, AddBackward, MulBackward, MatmulBackward, SumBackward
- 06_optimizers: Optimizer, SGD, Adam, AdamW classes
- 07_training: CosineSchedule, Trainer classes
- 08_dataloader: Dataset, TensorDataset, DataLoader classes
- 09_spatial: Conv2d, MaxPool2d, AvgPool2d, SimpleCNN classes
- 10-20: All exportable classes in remaining modules
TESTING:
- Test functions use 'if __name__ == "__main__"' guards
- Tests run in notebooks but NOT on import
- Rosenblatt Perceptron milestone working perfectly
RESULT:
✅ All 20 modules export correctly
✅ Perceptron (1957) milestone functional
✅ Clean separation: development (modules/source) vs package (tinytorch)
- 09_spatial: Export Conv2d, MaxPool2d, AvgPool2d only
- 10_tokenization: Export Tokenizer, CharTokenizer, BPETokenizer only
- 11_embeddings: Export Embedding, PositionalEncoding only
Continues professional selective export pattern. Clean public APIs,
development utilities remain in development environment.
- Remove circular imports where modules imported from themselves
- Convert tinytorch.core imports to sys.path relative imports
- Only import dependencies that are actually used in each module
- Preserve documentation imports in markdown cells
- Use consistent relative path pattern across all modules
- Remove hardcoded absolute paths in favor of relative imports
Affected modules: 02_activations, 03_layers, 04_losses, 06_optimizers,
07_training, 09_spatial, 12_attention, 17_quantization