TinyTorch

github-starred/TinyTorch

Fork 0

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-25 04:31:45 -05:00

Commit Graph

Author	SHA1	Message	Date
Vijay Janapa Reddi	c61f7ec7a6	Clean up milestone directories - Removed 30 debugging and development artifact files - Kept core system, documentation, and demo files - tests/milestones: 9 clean files (system + docs) - milestones/05_2017_transformer: 5 clean files (demos) - Clear, focused directory structure - Ready for students and developers	2025-11-22 20:30:58 -05:00
Vijay Janapa Reddi	61a1680cb8	Fix Tensor slicing gradient tracking - position embeddings now learn CRITICAL FIX: Monkey-patching for __getitem__ was not in source modules PROBLEM: - Previously modified tinytorch/core/autograd.py (compiled output) - But NOT modules/05_autograd/autograd.py (source) - Export regenerated compiled files WITHOUT the monkey-patching code - Result: Tensor slicing had NO gradient tracking SOLUTION: 1. Added tracked_getitem() to modules/05_autograd/autograd.py 2. Added _original_getitem store in enable_autograd() 3. Added Tensor.__getitem__ = tracked_getitem installation 4. Exported all modules (tensor, autograd, embeddings) VERIFICATION TESTS: ✅ Tensor slicing attaches SliceBackward ✅ Gradients flow correctly: x[:3].backward() → x.grad = [1,1,1,0,0] ✅ Position embeddings.grad is not None and has non-zero values ✅ All 19/19 parameters get gradients and update TRAINING RESULTS: - Loss drops: 1.58 → 1.26 (vs 1.62→1.24 before) - Training accuracy: 2.7% (vs 0% before) - Test accuracy: Still 0% (needs hyperparameter tuning) MODEL IS LEARNING (slightly) - this is progress! Next steps: Hyperparameter tuning (more epochs, different LR, larger model)	2025-11-22 18:29:38 -05:00
Vijay Janapa Reddi	0e135f1aea	Implement Tensor slicing with progressive disclosure and fix embedding gradient flow WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles MODULE 01 (Tensor): - Added __getitem__ method for basic slicing operations - Clean implementation with NO gradient mentions (progressive disclosure) - Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1] - Ensures scalar results are wrapped in arrays MODULE 05 (Autograd): - Added SliceBackward function for gradient computation - Implements proper gradient scatter: zeros everywhere except sliced positions - Added monkey-patching in enable_autograd() for __getitem__ - Follows same pattern as existing operations (add, mul, matmul) MODULE 11 (Embeddings): - Updated PositionalEncoding to use Tensor slicing instead of .data - Fixed multiple .data accesses that broke computation graphs - Removed Tensor() wrapping that created gradient-disconnected leafs - Uses proper Tensor operations to preserve gradient flow TESTING: - All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training) - 19/19 parameters get gradients (was 18/19 before) - Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before) - Model still not learning (0% accuracy) - needs fresh session to test monkey-patching WHY THIS MATTERS: - Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings - Progressive disclosure maintains educational integrity - Follows existing TinyTorch architecture patterns - Enables position embeddings to potentially learn (pending verification) DOCUMENTS CREATED: - milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md - milestones/05_2017_transformer/STATUS.md - milestones/05_2017_transformer/FIXES_SUMMARY.md - milestones/05_2017_transformer/DEBUG_REVERSAL.md - tests/milestones/test_reversal_debug.py (component tests) ARCHITECTURAL PRINCIPLE: Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems. Don't expose Module 05 concepts (gradients) in Module 01 (basic operations). Monkey-patch when features are needed, not before.	2025-11-22 18:26:12 -05:00

Author

SHA1

Message

Date

Vijay Janapa Reddi

c61f7ec7a6

Clean up milestone directories

- Removed 30 debugging and development artifact files
- Kept core system, documentation, and demo files
- tests/milestones: 9 clean files (system + docs)
- milestones/05_2017_transformer: 5 clean files (demos)
- Clear, focused directory structure
- Ready for students and developers

2025-11-22 20:30:58 -05:00

Vijay Janapa Reddi

61a1680cb8

Fix Tensor slicing gradient tracking - position embeddings now learn

CRITICAL FIX: Monkey-patching for __getitem__ was not in source modules

PROBLEM:
- Previously modified tinytorch/core/autograd.py (compiled output)
- But NOT modules/05_autograd/autograd.py (source)
- Export regenerated compiled files WITHOUT the monkey-patching code
- Result: Tensor slicing had NO gradient tracking

SOLUTION:
1. Added tracked_getitem() to modules/05_autograd/autograd.py
2. Added _original_getitem store in enable_autograd()
3. Added Tensor.__getitem__ = tracked_getitem installation
4. Exported all modules (tensor, autograd, embeddings)

VERIFICATION TESTS:
✅ Tensor slicing attaches SliceBackward
✅ Gradients flow correctly: x[:3].backward() → x.grad = [1,1,1,0,0]
✅ Position embeddings.grad is not None and has non-zero values
✅ All 19/19 parameters get gradients and update

TRAINING RESULTS:
- Loss drops: 1.58 → 1.26 (vs 1.62→1.24 before)
- Training accuracy: 2.7% (vs 0% before)
- Test accuracy: Still 0% (needs hyperparameter tuning)

MODEL IS LEARNING (slightly) - this is progress!

Next steps: Hyperparameter tuning (more epochs, different LR, larger model)

2025-11-22 18:29:38 -05:00

Vijay Janapa Reddi

0e135f1aea

Implement Tensor slicing with progressive disclosure and fix embedding gradient flow

WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles

MODULE 01 (Tensor):
- Added __getitem__ method for basic slicing operations
- Clean implementation with NO gradient mentions (progressive disclosure)
- Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1]
- Ensures scalar results are wrapped in arrays

MODULE 05 (Autograd):
- Added SliceBackward function for gradient computation
- Implements proper gradient scatter: zeros everywhere except sliced positions
- Added monkey-patching in enable_autograd() for __getitem__
- Follows same pattern as existing operations (add, mul, matmul)

MODULE 11 (Embeddings):
- Updated PositionalEncoding to use Tensor slicing instead of .data
- Fixed multiple .data accesses that broke computation graphs
- Removed Tensor() wrapping that created gradient-disconnected leafs
- Uses proper Tensor operations to preserve gradient flow

TESTING:
- All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training)
- 19/19 parameters get gradients (was 18/19 before)
- Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before)
- Model still not learning (0% accuracy) - needs fresh session to test monkey-patching

WHY THIS MATTERS:
- Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings
- Progressive disclosure maintains educational integrity
- Follows existing TinyTorch architecture patterns
- Enables position embeddings to potentially learn (pending verification)

DOCUMENTS CREATED:
- milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md
- milestones/05_2017_transformer/STATUS.md
- milestones/05_2017_transformer/FIXES_SUMMARY.md
- milestones/05_2017_transformer/DEBUG_REVERSAL.md
- tests/milestones/test_reversal_debug.py (component tests)

ARCHITECTURAL PRINCIPLE:
Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems.
Don't expose Module 05 concepts (gradients) in Module 01 (basic operations).
Monkey-patch when features are needed, not before.

2025-11-22 18:26:12 -05:00

3 Commits