- Embedding.forward() now preserves requires_grad from weight tensor
- PositionalEncoding.forward() uses Tensor addition (x + pos) instead of .data
- Critical for transformer input embeddings to have gradients
Both changes ensure gradient flows from loss back to embedding weights
- Implement gradient functions for subtraction and division operations
- Patch Tensor.__sub__ and Tensor.__truediv__ in enable_autograd()
- Required for LayerNorm (x - mean) and (normalized / std) operations
These operations are used extensively in normalization layers
- Preserve computation graph by using Tensor arithmetic (x - x_max, exp / sum)
- No more .data extraction that breaks gradient flow
- Numerically stable with max subtraction before exp
Required for transformer attention softmax gradient flow
- Add CLAUDE.md entry point for Claude AI system
- Fix tito test command to set PYTHONPATH for module imports
- Fix embeddings export directive placement for nbdev
- Fix attention module to export imports properly
- Fix transformers embedding index casting to int
- Update transformers module to match tokenization style with improved ASCII diagrams
- Fix attention module to use proper multi-head interface
- Update transformer era milestone for refined module integration
- Fix import paths and ensure forward() method consistency
- All transformer components now work seamlessly together
Following module developer guidelines, added comprehensive visual diagrams:
1. Text-to-Numbers Pipeline (Introduction):
- Added full boxed diagram showing 4-step tokenization process
- Clear visual flow from human text to numerical IDs
- Each step explained inline with the diagram
2. Character Tokenization Process:
- Step-by-step vocabulary building visualization
- Shows corpus → unique chars → vocab with IDs
- Encoding process with ID lookup visualization
- Decoding process with reverse lookup
- All in clear nested boxes
3. BPE Training Algorithm:
- Comprehensive 4-step process with nested boxes
- Pair frequency analysis with bar charts (████)
- Before/After merge visualizations
- Iteration examples showing vocabulary growth
- Final results with key insights
4. Memory Layout for Embedding Tables:
- Visual bars showing relative memory sizes
- Character (204KB) vs BPE-50K (102MB) vs Word-100K (204MB)
- Shows fp32/fp16/int8 precision trade-offs
- Real production model examples (GPT-2/3, BERT, T5, LLaMA)
- Clear table format for comparison
Educational improvements:
- More visual, less text-heavy
- Clearer step-by-step flows
- Better intuition building
- Production context throughout
- Following module developer ASCII diagram patterns
Students now see:
- HOW tokenization works (not just WHAT)
- WHY different strategies exist
- WHAT the memory implications are
- HOW production models make these choices
Changes:
- Removed broken _SimplifiedTensor and internal Module helper classes
- Updated imports to use tinytorch.core instead of dev modules
- Removed Module inheritance from Conv2d, MaxPool2d, AvgPool2d, SimpleCNN
- All spatial classes now standalone like Linear in layers module
This allows spatial module to export cleanly and import correctly:
from tinytorch.core.spatial import Conv2d, MaxPool2d, AvgPool2d
Smoke test: Conv2d(1,3,8,8) → (1,16,6,6) ✓
Added integration tests for DataLoader:
- test_dataloader_integration.py in tests/integration/
- Training workflow integration
- Shuffle consistency across epochs
- Memory efficiency verification
Updated Module 08:
- Added note about optional performance analysis
- Clarified that analysis functions can be run manually
- Clean flow: text → code → tests
Updated datasets/tiny/README.md:
- Minor formatting fixes
Module 08 is now complete and ready to export:
✅ Dataset abstraction
✅ TensorDataset implementation
✅ DataLoader with batching/shuffling
✅ ASCII visualizations for understanding
✅ Unit tests (in module)
✅ Integration tests (in tests/)
✅ Performance analysis tools (optional)
Next: Export with 'bin/tito export 08_dataloader'
Fixed issue where performance analysis functions were called every time
the module was imported, instead of only when needed.
Changes:
- Commented out analyze_dataloader_performance() bare call
- Commented out analyze_memory_usage() bare call
- Removed redundant test_training_integration() comment
These functions are still defined and can be called manually for
performance insights, but won't run on every import.
The test_module() function still calls all necessary tests when
the module is run as __main__.
Result: Module imports cleanly without running expensive performance
benchmarks unless explicitly requested.
Added educational ASCII art showing:
1. **Actual pixel values** - What 8×8 digit images look like as numbers
- Shows digits 5, 3, and 8 with real pixel values (0-16 range)
- Helps students understand images are just 2D arrays
2. **Visual representation** - How humans see the digits
- ASCII art showing recognizable digit shapes
- Connects abstract numbers to concrete patterns
3. **Shape transformations** - How DataLoader batches data
- Individual: (8, 8) → Batched: (32, 8, 8)
- Shows what the model actually receives
4. **Complete example** - Loading and using tiny digits dataset
- Real code showing datasets/tiny/digits_8x8.npz usage
- Demonstrates the full DataLoader workflow
Benefits:
✅ Students visualize what image data IS
✅ Understand DataLoader's batching transformation
✅ See connection between numbers and visual patterns
✅ Ready to work with real datasets in milestones
This makes the abstract concept of 'image tensors' concrete and visual.
Issue: xor_crisis.py was failing with ImportError on matplotlib architecture mismatch
Root cause: losses_dev.py imported matplotlib.pyplot but never used it
Fix:
- ✅ Removed unused imports: matplotlib.pyplot, time
- ✅ Re-exported module 04_losses to update tinytorch package
- ✅ Verified both milestone 02 scripts now run successfully
The matplotlib import was causing failures on M2 Macs where matplotlib
was installed for wrong architecture (x86_64 vs arm64). Since it was
never used, removing it eliminates the dependency entirely.
Tested:
- ✅ milestones/02_xor_crisis_1969/xor_crisis.py (49% accuracy - expected failure)
- ✅ milestones/02_xor_crisis_1969/xor_solved.py (100% accuracy - perfect!)
New Features:
- Add ReLUBackward for proper ReLU gradient computation
- Patch ReLU.forward() in enable_autograd() for gradient tracking
- Create polished XOR milestone scripts matching perceptron style
XOR Milestone Scripts (milestones/02_xor_crisis_1969/):
- xor_crisis.py: Shows single-layer perceptron FAILING (~50% accuracy)
- xor_solved.py: Shows multi-layer network SUCCEEDING (75%+ accuracy)
- Beautiful rich output with tables, panels, historical context
- Pedagogically structured like the perceptron milestone
Results:
✅ Single-layer: Stuck at ~50% (proves the crisis)
✅ Multi-layer: 75% accuracy (proves hidden layers work!)
✅ ReLU gradients flow correctly through network
✅ All 4 core activations now support autograd:
- Sigmoid ✓, ReLU ✓, Tanh ✓ (future), GELU ✓ (future)
Historical Significance:
This recreates the exact problem that killed AI for 17 years
and demonstrates the solution that started the modern era!
Refactored gradient accumulation to use clearer two-step approach:
1. Remove extra leading dimensions (batch dims)
2. Sum over dimensions that were size-1 (broadcast dims)
Benefits:
- Clearer intent: while loop for variable dims, for loop for fixed dims
- Better comments with concrete examples
- Easier for students to understand broadcasting in backprop
- Matches how you'd explain it verbally
Same functionality, cleaner code.
Added:
- SigmoidBackward class to modules/source/05_autograd/autograd_dev.py with #| export
- BCEBackward class to modules/source/05_autograd/autograd_dev.py with #| export
- Both classes exported to tinytorch/core/autograd.py
- Updated Sigmoid activation to track gradients using SigmoidBackward
- Updated BCE loss to track gradients using BCEBackward
ISSUE: Training still not learning - gradients not flowing properly
- Loss stays constant at 0.7911
- Weights don't update
- Sigmoid.forward() code looks correct but a.requires_grad stays False
- Need to investigate why gradient tracking isn't working through activations
Updated docstring examples to use cleaner callable syntax:
- loss_fn(predictions, targets) instead of loss_fn.forward(predictions, targets)
Applied to:
- MSELoss
- CrossEntropyLoss
- BinaryCrossEntropyLoss
Demonstrates proper usage with __call__ methods for cleaner, more Pythonic code.
Updated docstring examples to use cleaner callable syntax:
- sigmoid(x) instead of sigmoid.forward(x)
- relu(x) instead of relu.forward(x)
- tanh(x) instead of tanh.forward(x)
- gelu(x) instead of gelu.forward(x)
- softmax(x) instead of softmax.forward(x)
This demonstrates the proper usage pattern with the __call__ methods
we just added, making examples more Pythonic and PyTorch-compatible.
Enable cleaner API usage by adding __call__ methods to all activation,
layer, and loss classes. This allows students to write:
- relu(x) instead of relu.forward(x)
- layer(x) instead of layer.forward(x)
- loss_fn(pred, target) instead of loss_fn.forward(pred, target)
Changes:
- Module 02 (Activations): Add __call__ to ReLU, Tanh, GELU, Softmax
* Sigmoid already had __call__
- Module 03 (Layers): Add __call__ to Dropout
* Linear already had __call__
- Module 04 (Losses): Add __call__ to MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss
This matches PyTorch's API convention where model(x) calls model.__call__(x)
which internally calls model.forward(x). Makes code more Pythonic and
intuitive for students familiar with PyTorch.
Expected impact: Test pass rates should improve significantly as tests
expect PyTorch-style callable API.
- Reorganized milestone structure to historical progression (01-06)
- Created single forward_pass.py with student code clearly at top
- Added Rich CLI visualizations: data scatter, network diagram, decision boundary
- Show decision boundary using / or \ based on slope
- No random seed - students see variability in random weights
- Annotated all code with which modules were used (Modules 01-03)
- Added introductory panel explaining what to expect
- Updated DEFINITIVE_MODULE_PLAN.md with corrected milestone structure
PROBLEM:
- nbdev requires #| export directive on EACH cell to export when using # %% markers
- Cell markers inside class definitions split classes across multiple cells
- Only partial classes were being exported to tinytorch package
- Missing matmul, arithmetic operations, and activation classes in exports
SOLUTION:
1. Removed # %% cell markers INSIDE class definitions (kept classes as single units)
2. Added #| export to imports cell at top of each module
3. Added #| export before each exportable class definition in all 20 modules
4. Added __call__ method to Sigmoid for functional usage
5. Fixed numpy import (moved to module level from __init__)
MODULES FIXED:
- 01_tensor: Tensor class with all operations (matmul, arithmetic, shape ops)
- 02_activations: Sigmoid, ReLU, Tanh, GELU, Softmax classes
- 03_layers: Linear, Dropout classes
- 04_losses: MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss classes
- 05_autograd: Function, AddBackward, MulBackward, MatmulBackward, SumBackward
- 06_optimizers: Optimizer, SGD, Adam, AdamW classes
- 07_training: CosineSchedule, Trainer classes
- 08_dataloader: Dataset, TensorDataset, DataLoader classes
- 09_spatial: Conv2d, MaxPool2d, AvgPool2d, SimpleCNN classes
- 10-20: All exportable classes in remaining modules
TESTING:
- Test functions use 'if __name__ == "__main__"' guards
- Tests run in notebooks but NOT on import
- Rosenblatt Perceptron milestone working perfectly
RESULT:
✅ All 20 modules export correctly
✅ Perceptron (1957) milestone functional
✅ Clean separation: development (modules/source) vs package (tinytorch)
- 12_attention: Export scaled_dot_product_attention, MultiHeadAttention only
- 13_transformers: Export TransformerBlock, GPT only
Continues professional selective export pattern across advanced modules.
Clean public APIs for transformer architecture components.
- 09_spatial: Export Conv2d, MaxPool2d, AvgPool2d only
- 10_tokenization: Export Tokenizer, CharTokenizer, BPETokenizer only
- 11_embeddings: Export Embedding, PositionalEncoding only
Continues professional selective export pattern. Clean public APIs,
development utilities remain in development environment.
- 07_training: Export Trainer, CosineSchedule, clip_grad_norm only
- 08_dataloader: Export Dataset, DataLoader, TensorDataset only
Continues professional selective export pattern across all modules.
Development utilities remain in development, clean public API exported.
BREAKING CHANGE: Refactor from whole-module exports to selective function/class exports
**What Changed:**
- Separate development utilities from production exports
- Each function/class gets individual #| export directive
- Clean Prerequisites & Setup sections in all modules
- Development helpers (import_previous_module) not exported
**Module Export Summary:**
- 01_tensor: Tensor class only
- 02_activations: Sigmoid, ReLU, Tanh, GELU, Softmax only
- 03_layers: Linear, Dropout only
- 04_losses: MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss, log_softmax only
- 05_autograd: Function class only
- 06_optimizers: SGD, Adam, AdamW only
**Benefits:**
✅ Clean public API (matches PyTorch/TensorFlow patterns)
✅ No development utilities in final package
✅ Professional software education standards
✅ Clear separation of concerns
✅ Educational clarity for students
This matches industry standards for educational ML frameworks.
- Add import_previous_module() helper function to all core modules (01-07)
- Standardize cross-module imports for integration testing
- Add clear Prerequisites & Setup sections explaining module dependencies
- Update integration tests to use standardized import pattern
- Maintain clean separation between development and production code
This provides a consistent, educational approach to module integration
while keeping the codebase maintainable and student-friendly.
✨ Major improvements to Module 05: Autograd
- Add complete Jupyter notebook structure with markdown cells
- Enhance all Function classes with detailed mathematical explanations
- Add comprehensive unit tests with proper test patterns
- Improve enable_autograd() with detailed documentation
- Add integration tests for complex computation graphs
- Include educational visualizations and examples
- Follow TinyTorch standards with ⭐⭐ difficulty rating
- All tests pass: Function classes, Tensor autograd, integration scenarios
🎯 Ready for student use with modern PyTorch 2.0 style autograd
- Remove circular imports where modules imported from themselves
- Convert tinytorch.core imports to sys.path relative imports
- Only import dependencies that are actually used in each module
- Preserve documentation imports in markdown cells
- Use consistent relative path pattern across all modules
- Remove hardcoded absolute paths in favor of relative imports
Affected modules: 02_activations, 03_layers, 04_losses, 06_optimizers,
07_training, 09_spatial, 12_attention, 17_quantization
Major Accomplishments:
• Rebuilt all 20 modules with comprehensive explanations before each function
• Fixed explanatory placement: detailed explanations before implementations, brief descriptions before tests
• Enhanced all modules with ASCII diagrams for visual learning
• Comprehensive individual module testing and validation
• Created milestone directory structure with working examples
• Fixed critical Module 01 indentation error (methods were outside Tensor class)
Module Status:
✅ Modules 01-07: Fully working (Tensor → Training pipeline)
✅ Milestone 1: Perceptron - ACHIEVED (95% accuracy on 2D data)
✅ Milestone 2: MLP - ACHIEVED (complete training with autograd)
⚠️ Modules 08-20: Mixed results (import dependencies need fixes)
Educational Impact:
• Students can now learn complete ML pipeline from tensors to training
• Clear progression: basic operations → neural networks → optimization
• Explanatory sections provide proper context before implementation
• Working milestones demonstrate practical ML capabilities
Next Steps:
• Fix import dependencies in advanced modules (9, 11, 12, 17-20)
• Debug timeout issues in modules 14, 15
• First 7 modules provide solid foundation for immediate educational use(https://claude.ai/code)
- Added progressive complexity guidelines (Foundation/Intermediate/Advanced)
- Added measurement function consolidation to prevent information overload
- Fixed all diagnostic issues in losses_dev.py
- Fixed markdown formatting across all modules
- Consolidated redundant analysis functions in foundation modules
- Fixed syntax errors and unused variables
- Ensured all educational content is in proper markdown cells for Jupyter
- Enhanced module-developer agent with Dr. Sarah Rodriguez persona
- Added comprehensive educational frameworks and Golden Rules
- Implemented Progressive Disclosure Principle (no forward references)
- Added Immediate Testing Pattern (test after each implementation)
- Integrated package structure template (📦 where code exports to)
- Applied clean NBGrader structure with proper scaffolding
- Fixed tensor module formatting and scope boundaries
- Removed confusing transparent analysis patterns
- Added visual impact icons system for consistent motivation
🎯 Ready to apply these proven educational principles to all modules