- Add CLAUDE.md entry point for Claude AI system
- Fix tito test command to set PYTHONPATH for module imports
- Fix embeddings export directive placement for nbdev
- Fix attention module to export imports properly
- Fix transformers embedding index casting to int
- Update transformers module to match tokenization style with improved ASCII diagrams
- Fix attention module to use proper multi-head interface
- Update transformer era milestone for refined module integration
- Fix import paths and ensure forward() method consistency
- All transformer components now work seamlessly together
Following module developer guidelines, added comprehensive visual diagrams:
1. Text-to-Numbers Pipeline (Introduction):
- Added full boxed diagram showing 4-step tokenization process
- Clear visual flow from human text to numerical IDs
- Each step explained inline with the diagram
2. Character Tokenization Process:
- Step-by-step vocabulary building visualization
- Shows corpus → unique chars → vocab with IDs
- Encoding process with ID lookup visualization
- Decoding process with reverse lookup
- All in clear nested boxes
3. BPE Training Algorithm:
- Comprehensive 4-step process with nested boxes
- Pair frequency analysis with bar charts (████)
- Before/After merge visualizations
- Iteration examples showing vocabulary growth
- Final results with key insights
4. Memory Layout for Embedding Tables:
- Visual bars showing relative memory sizes
- Character (204KB) vs BPE-50K (102MB) vs Word-100K (204MB)
- Shows fp32/fp16/int8 precision trade-offs
- Real production model examples (GPT-2/3, BERT, T5, LLaMA)
- Clear table format for comparison
Educational improvements:
- More visual, less text-heavy
- Clearer step-by-step flows
- Better intuition building
- Production context throughout
- Following module developer ASCII diagram patterns
Students now see:
- HOW tokenization works (not just WHAT)
- WHY different strategies exist
- WHAT the memory implications are
- HOW production models make these choices
Changes:
- Removed broken _SimplifiedTensor and internal Module helper classes
- Updated imports to use tinytorch.core instead of dev modules
- Removed Module inheritance from Conv2d, MaxPool2d, AvgPool2d, SimpleCNN
- All spatial classes now standalone like Linear in layers module
This allows spatial module to export cleanly and import correctly:
from tinytorch.core.spatial import Conv2d, MaxPool2d, AvgPool2d
Smoke test: Conv2d(1,3,8,8) → (1,16,6,6) ✓
Added integration tests for DataLoader:
- test_dataloader_integration.py in tests/integration/
- Training workflow integration
- Shuffle consistency across epochs
- Memory efficiency verification
Updated Module 08:
- Added note about optional performance analysis
- Clarified that analysis functions can be run manually
- Clean flow: text → code → tests
Updated datasets/tiny/README.md:
- Minor formatting fixes
Module 08 is now complete and ready to export:
✅ Dataset abstraction
✅ TensorDataset implementation
✅ DataLoader with batching/shuffling
✅ ASCII visualizations for understanding
✅ Unit tests (in module)
✅ Integration tests (in tests/)
✅ Performance analysis tools (optional)
Next: Export with 'bin/tito export 08_dataloader'
Fixed issue where performance analysis functions were called every time
the module was imported, instead of only when needed.
Changes:
- Commented out analyze_dataloader_performance() bare call
- Commented out analyze_memory_usage() bare call
- Removed redundant test_training_integration() comment
These functions are still defined and can be called manually for
performance insights, but won't run on every import.
The test_module() function still calls all necessary tests when
the module is run as __main__.
Result: Module imports cleanly without running expensive performance
benchmarks unless explicitly requested.
Added educational ASCII art showing:
1. **Actual pixel values** - What 8×8 digit images look like as numbers
- Shows digits 5, 3, and 8 with real pixel values (0-16 range)
- Helps students understand images are just 2D arrays
2. **Visual representation** - How humans see the digits
- ASCII art showing recognizable digit shapes
- Connects abstract numbers to concrete patterns
3. **Shape transformations** - How DataLoader batches data
- Individual: (8, 8) → Batched: (32, 8, 8)
- Shows what the model actually receives
4. **Complete example** - Loading and using tiny digits dataset
- Real code showing datasets/tiny/digits_8x8.npz usage
- Demonstrates the full DataLoader workflow
Benefits:
✅ Students visualize what image data IS
✅ Understand DataLoader's batching transformation
✅ See connection between numbers and visual patterns
✅ Ready to work with real datasets in milestones
This makes the abstract concept of 'image tensors' concrete and visual.
Issue: xor_crisis.py was failing with ImportError on matplotlib architecture mismatch
Root cause: losses_dev.py imported matplotlib.pyplot but never used it
Fix:
- ✅ Removed unused imports: matplotlib.pyplot, time
- ✅ Re-exported module 04_losses to update tinytorch package
- ✅ Verified both milestone 02 scripts now run successfully
The matplotlib import was causing failures on M2 Macs where matplotlib
was installed for wrong architecture (x86_64 vs arm64). Since it was
never used, removing it eliminates the dependency entirely.
Tested:
- ✅ milestones/02_xor_crisis_1969/xor_crisis.py (49% accuracy - expected failure)
- ✅ milestones/02_xor_crisis_1969/xor_solved.py (100% accuracy - perfect!)
New Features:
- Add ReLUBackward for proper ReLU gradient computation
- Patch ReLU.forward() in enable_autograd() for gradient tracking
- Create polished XOR milestone scripts matching perceptron style
XOR Milestone Scripts (milestones/02_xor_crisis_1969/):
- xor_crisis.py: Shows single-layer perceptron FAILING (~50% accuracy)
- xor_solved.py: Shows multi-layer network SUCCEEDING (75%+ accuracy)
- Beautiful rich output with tables, panels, historical context
- Pedagogically structured like the perceptron milestone
Results:
✅ Single-layer: Stuck at ~50% (proves the crisis)
✅ Multi-layer: 75% accuracy (proves hidden layers work!)
✅ ReLU gradients flow correctly through network
✅ All 4 core activations now support autograd:
- Sigmoid ✓, ReLU ✓, Tanh ✓ (future), GELU ✓ (future)
Historical Significance:
This recreates the exact problem that killed AI for 17 years
and demonstrates the solution that started the modern era!
Refactored gradient accumulation to use clearer two-step approach:
1. Remove extra leading dimensions (batch dims)
2. Sum over dimensions that were size-1 (broadcast dims)
Benefits:
- Clearer intent: while loop for variable dims, for loop for fixed dims
- Better comments with concrete examples
- Easier for students to understand broadcasting in backprop
- Matches how you'd explain it verbally
Same functionality, cleaner code.
Added:
- SigmoidBackward class to modules/source/05_autograd/autograd_dev.py with #| export
- BCEBackward class to modules/source/05_autograd/autograd_dev.py with #| export
- Both classes exported to tinytorch/core/autograd.py
- Updated Sigmoid activation to track gradients using SigmoidBackward
- Updated BCE loss to track gradients using BCEBackward
ISSUE: Training still not learning - gradients not flowing properly
- Loss stays constant at 0.7911
- Weights don't update
- Sigmoid.forward() code looks correct but a.requires_grad stays False
- Need to investigate why gradient tracking isn't working through activations
Updated docstring examples to use cleaner callable syntax:
- loss_fn(predictions, targets) instead of loss_fn.forward(predictions, targets)
Applied to:
- MSELoss
- CrossEntropyLoss
- BinaryCrossEntropyLoss
Demonstrates proper usage with __call__ methods for cleaner, more Pythonic code.
Updated docstring examples to use cleaner callable syntax:
- sigmoid(x) instead of sigmoid.forward(x)
- relu(x) instead of relu.forward(x)
- tanh(x) instead of tanh.forward(x)
- gelu(x) instead of gelu.forward(x)
- softmax(x) instead of softmax.forward(x)
This demonstrates the proper usage pattern with the __call__ methods
we just added, making examples more Pythonic and PyTorch-compatible.
Enable cleaner API usage by adding __call__ methods to all activation,
layer, and loss classes. This allows students to write:
- relu(x) instead of relu.forward(x)
- layer(x) instead of layer.forward(x)
- loss_fn(pred, target) instead of loss_fn.forward(pred, target)
Changes:
- Module 02 (Activations): Add __call__ to ReLU, Tanh, GELU, Softmax
* Sigmoid already had __call__
- Module 03 (Layers): Add __call__ to Dropout
* Linear already had __call__
- Module 04 (Losses): Add __call__ to MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss
This matches PyTorch's API convention where model(x) calls model.__call__(x)
which internally calls model.forward(x). Makes code more Pythonic and
intuitive for students familiar with PyTorch.
Expected impact: Test pass rates should improve significantly as tests
expect PyTorch-style callable API.
- Reorganized milestone structure to historical progression (01-06)
- Created single forward_pass.py with student code clearly at top
- Added Rich CLI visualizations: data scatter, network diagram, decision boundary
- Show decision boundary using / or \ based on slope
- No random seed - students see variability in random weights
- Annotated all code with which modules were used (Modules 01-03)
- Added introductory panel explaining what to expect
- Updated DEFINITIVE_MODULE_PLAN.md with corrected milestone structure
PROBLEM:
- nbdev requires #| export directive on EACH cell to export when using # %% markers
- Cell markers inside class definitions split classes across multiple cells
- Only partial classes were being exported to tinytorch package
- Missing matmul, arithmetic operations, and activation classes in exports
SOLUTION:
1. Removed # %% cell markers INSIDE class definitions (kept classes as single units)
2. Added #| export to imports cell at top of each module
3. Added #| export before each exportable class definition in all 20 modules
4. Added __call__ method to Sigmoid for functional usage
5. Fixed numpy import (moved to module level from __init__)
MODULES FIXED:
- 01_tensor: Tensor class with all operations (matmul, arithmetic, shape ops)
- 02_activations: Sigmoid, ReLU, Tanh, GELU, Softmax classes
- 03_layers: Linear, Dropout classes
- 04_losses: MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss classes
- 05_autograd: Function, AddBackward, MulBackward, MatmulBackward, SumBackward
- 06_optimizers: Optimizer, SGD, Adam, AdamW classes
- 07_training: CosineSchedule, Trainer classes
- 08_dataloader: Dataset, TensorDataset, DataLoader classes
- 09_spatial: Conv2d, MaxPool2d, AvgPool2d, SimpleCNN classes
- 10-20: All exportable classes in remaining modules
TESTING:
- Test functions use 'if __name__ == "__main__"' guards
- Tests run in notebooks but NOT on import
- Rosenblatt Perceptron milestone working perfectly
RESULT:
✅ All 20 modules export correctly
✅ Perceptron (1957) milestone functional
✅ Clean separation: development (modules/source) vs package (tinytorch)
- 12_attention: Export scaled_dot_product_attention, MultiHeadAttention only
- 13_transformers: Export TransformerBlock, GPT only
Continues professional selective export pattern across advanced modules.
Clean public APIs for transformer architecture components.
- 09_spatial: Export Conv2d, MaxPool2d, AvgPool2d only
- 10_tokenization: Export Tokenizer, CharTokenizer, BPETokenizer only
- 11_embeddings: Export Embedding, PositionalEncoding only
Continues professional selective export pattern. Clean public APIs,
development utilities remain in development environment.
- 07_training: Export Trainer, CosineSchedule, clip_grad_norm only
- 08_dataloader: Export Dataset, DataLoader, TensorDataset only
Continues professional selective export pattern across all modules.
Development utilities remain in development, clean public API exported.
BREAKING CHANGE: Refactor from whole-module exports to selective function/class exports
**What Changed:**
- Separate development utilities from production exports
- Each function/class gets individual #| export directive
- Clean Prerequisites & Setup sections in all modules
- Development helpers (import_previous_module) not exported
**Module Export Summary:**
- 01_tensor: Tensor class only
- 02_activations: Sigmoid, ReLU, Tanh, GELU, Softmax only
- 03_layers: Linear, Dropout only
- 04_losses: MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss, log_softmax only
- 05_autograd: Function class only
- 06_optimizers: SGD, Adam, AdamW only
**Benefits:**
✅ Clean public API (matches PyTorch/TensorFlow patterns)
✅ No development utilities in final package
✅ Professional software education standards
✅ Clear separation of concerns
✅ Educational clarity for students
This matches industry standards for educational ML frameworks.
- Add import_previous_module() helper function to all core modules (01-07)
- Standardize cross-module imports for integration testing
- Add clear Prerequisites & Setup sections explaining module dependencies
- Update integration tests to use standardized import pattern
- Maintain clean separation between development and production code
This provides a consistent, educational approach to module integration
while keeping the codebase maintainable and student-friendly.
✨ Major improvements to Module 05: Autograd
- Add complete Jupyter notebook structure with markdown cells
- Enhance all Function classes with detailed mathematical explanations
- Add comprehensive unit tests with proper test patterns
- Improve enable_autograd() with detailed documentation
- Add integration tests for complex computation graphs
- Include educational visualizations and examples
- Follow TinyTorch standards with ⭐⭐ difficulty rating
- All tests pass: Function classes, Tensor autograd, integration scenarios
🎯 Ready for student use with modern PyTorch 2.0 style autograd
- Remove circular imports where modules imported from themselves
- Convert tinytorch.core imports to sys.path relative imports
- Only import dependencies that are actually used in each module
- Preserve documentation imports in markdown cells
- Use consistent relative path pattern across all modules
- Remove hardcoded absolute paths in favor of relative imports
Affected modules: 02_activations, 03_layers, 04_losses, 06_optimizers,
07_training, 09_spatial, 12_attention, 17_quantization
- Remove demonstrate_complex_computation_graph() function from Module 05 (autograd)
- Remove demonstrate_optimizer_integration() function from Module 06 (optimizers)
- Module 04 (losses) had no demonstration functions to remove
- Keep all core implementations and unit test functions intact
- Keep final test_module() function for integration testing
- All module tests continue to pass after cleanup
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
This change ensures tests run immediately when developing modules but don't execute when modules are imported by other modules.
Changes:
- Protected all test executions with if __name__ == "__main__" blocks
- Unit tests run immediately after function definitions during development
- Module integration test (test_module()) runs at end when executed directly
- Updated module-developer.md with new testing patterns and examples
Benefits:
- Students see immediate feedback when developing (python module_dev.py runs all tests)
- Clean imports: later modules can import earlier ones without triggering tests
- Maintains educational flow: tests visible right after implementations
- Compatible with nbgrader and notebook environments
Tested:
- Module 01 runs all tests when executed directly ✓
- Importing Tensor from tensor_dev doesn't run tests ✓
- Cross-module imports work without test interference ✓
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Removed complexity from Module 07 (training):
- Removed DemoModel and TestModel classes
- Unified all tests/demos to use single minimal MockModel
- Module now focuses purely on training infrastructure
What remains:
- Trainer class (the core training orchestrator)
- CosineSchedule (learning rate scheduling)
- clip_grad_norm (gradient clipping utility)
- Training loop mechanics and checkpointing
Impact:
- Cleaner, more focused module
- No distraction from model architecture
- Tests training infrastructure, not model building
- All tests still pass with simplified mocks
The module now teaches exactly what it should: how to train
models, not how to build them.
Major changes to module structure:
1. Updated module-developer.md with clear components-only rule
2. Removed Sequential container from Module 03 (layers)
3. Converted to manual layer composition for transparency
Philosophy:
- Modules build ATOMIC COMPONENTS (Tensor, Linear, ReLU, etc.)
- Milestones/Examples show EXPLICIT COMPOSITION
- Students SEE how their components connect
- No hidden abstractions or black boxes
Module 03 changes:
- REMOVED: Sequential class and tests (~200 lines)
- KEPT: Linear and Dropout as individual components
- UPDATED: Integration demos use manual composition
- Result: Students see explicit layer1.forward(x) calls
Module 07 changes:
- Simplified model classes to minimal test fixtures
- Removed complex neural network teaching examples
- Focus purely on training infrastructure
Impact:
- Clearer learning progression
- Students understand each component's role
- Milestones become showcases of student work
- No magic containers hiding the data flow
Removed redundant test calls from all modules:
- Eliminated verbose if __name__ == '__main__': blocks
- Removed duplicate individual test calls
- Each module now simply calls test_module() directly
Changes made to all 9 modules:
- Module 01 (Tensor): Simplified from 16-line main block to 1 line
- Module 02 (Activations): Simplified from 13-line main block to 1 line
- Module 03 (Layers): Simplified from 17-line main block to 1 line
- Module 04 (Losses): Simplified from 20-line main block to 1 line
- Module 05 (Autograd): Simplified from 19-line main block to 1 line
- Module 06 (Optimizers): Simplified from 17-line main block to 1 line
- Module 07 (Training): Simplified from 16-line main block to 1 line
- Module 08 (DataLoader): Simplified from 17-line main block to 1 line
- Module 09 (Spatial): Simplified from 14-line main block to 1 line
Impact:
- Notebook-friendly: Tests run immediately in Jupyter environments
- No redundancy: test_module() already runs all unit tests
- Cleaner code: ~140 lines of redundant code removed
- Better for students: Simpler, more direct execution flow
Wrapped test code in if __name__ == '__main__': guards for:
- Module 02 (activations): 7 test calls protected
- Module 03 (layers): 7 test calls protected
- Module 04 (losses): 10 test calls protected
- Module 05 (autograd): 7 test calls protected
- Module 06 (optimizers): 8 test calls protected
- Module 07 (training): 7 test calls protected
- Module 09 (spatial): 5 test calls protected
Impact:
- All modules can now be imported cleanly without test execution
- Tests still run when modules are executed directly
- Clean dependency chain throughout the framework
- Follows Python best practices for module structure
This completes the fix for the entire module system. Modules can now
properly import from each other without triggering test code execution.
Critical fixes to resolve module import issues:
1. Module 01 (tensor_dev.py):
- Wrapped all test calls in if __name__ == '__main__': guards
- Tests no longer execute during import
- Clean imports now work: from tensor_dev import Tensor
2. Module 08 (dataloader_dev.py):
- REMOVED redefined Tensor class (was breaking dependency chain)
- Now imports real Tensor from Module 01
- DataLoader uses actual Tensor with full gradient support
Impact:
- Modules properly build on previous work (no isolated implementations)
- Clean dependency chain: each module imports from previous modules
- No test execution during imports = fast, clean module loading
This resolves the root cause where DataLoader had to redefine Tensor
because importing tensor_dev.py would execute all test code.
Major refactoring:
- Eliminated Variable class completely from autograd module
- Implemented progressive enhancement pattern with enable_autograd()
- All modules now use pure Tensor with requires_grad=True
- PyTorch 2.0 compatible API throughout
- Clean separation: Module 01 has simple Tensor, Module 05 enhances with gradients
- Fixed all imports and references across layers, activations, losses
- Educational clarity: students learn modern patterns from day one
The system now follows the principle: 'One Tensor class to rule them all'
No more confusion between Variable and Tensor - everything is just Tensor!