Commit Graph

849 Commits

Author SHA1 Message Date
Zappandy
7f5d591aed fixed default venv value in config for validation 2025-10-11 13:12:40 +02:00
Zappandy
e96d821fa1 Feat(env) dynamic virtual env support for advanced users 2025-10-11 12:31:11 +02:00
Vijay Janapa Reddi
ad0efbb434 feat: Add overfitting detection to Milestones 03 and 04
Track train vs test accuracy to detect overfitting:

Training Progress:
- Print both train and test accuracy every 5 epochs
- Show gap between train/test with indicator:
  ✓ Gap < 10%: Healthy generalization
  ⚠️ Gap > 10%: Overfitting warning

Results Table (ACT 4):
- Train Accuracy + improvement
- Test Accuracy + improvement
- Overfitting Gap + status
- Training Time

Final Panel (ACT 5):
- Show test accuracy with gap
- Celebrate good generalization

Educational Value:
Students now see:
1. How to detect overfitting (growing train/test gap)
2. When model memorizes vs generalizes
3. Real ML systems track BOTH metrics

Example output:
  Epoch  5/20  Loss: 1.234  Train: 85.0%  Test: 82.0%  ✓ Gap: 3.0%
  Epoch 10/20  Loss: 0.891  Train: 90.0%  Test: 87.0%  ✓ Gap: 3.0%

This prepares them for regularization techniques (Dropout, etc.)
in later modules!
2025-09-30 17:33:54 -04:00
Vijay Janapa Reddi
877b9adc27 style: Make Milestone 04 architecture consistent with others
Changed from Panel to plain console.print with ASCII diagram.

All 4 milestones now follow identical format:
- console.print('[bold]🏗️ The Architecture:[/bold]')
- ASCII box diagram with arrows
- console.print('[bold]🔧 Components:[/bold]')
- Bullet list of components

This ensures visual consistency across all milestone demonstrations!
2025-09-30 17:17:46 -04:00
Vijay Janapa Reddi
e6d0757bbd refactor: Keep explicit module imports + optimize CNN milestone
Import Strategy:
- Keep explicit 'from tinytorch.core.spatial import Conv2d'
- Maps directly to module structure (Module 09 → core.spatial)
- Better for education: students see exactly where each concept lives
- Removed redundant tinytorch/nn.py (nn/ directory already exists)

Milestone 04 Optimizations:
- Reduced epochs: 50 → 20 (explicit loops are slow!)
- Print progress every 5 epochs (instead of 10)
- Load from local npz file (no sklearn dependency)
- Still achieves ~80%+ accuracy

Educational Rationale:
TinyTorch uses explicit imports to show module structure:
  tinytorch.core.tensor      # Module 01
  tinytorch.core.layers      # Module 03
  tinytorch.core.spatial     # Module 09
  tinytorch.core.losses      # Module 04

PyTorch's torch.nn is convenient but pedagogically unclear.
Our approach: clarity over convenience!
2025-09-30 17:15:40 -04:00
Vijay Janapa Reddi
95274448bd feat: Add Milestone 04 (CNN Revolution 1998) + Clean spatial imports
Milestone 04 - CNN Revolution:
 Complete 5-Act narrative structure (Challenge → Reflection)
 SimpleCNN architecture: Conv2d → ReLU → MaxPool → Linear
 Trains on 8x8 digits dataset (1,437 train, 360 test)
 Achieves 84.2% accuracy with only 810 parameters
 Demonstrates spatial operations preserve structure
 Beautiful visual output with progress tracking

Key Features:
- Conv2d (1→8 channels, 3×3 kernel) detects local patterns
- MaxPool2d (2×2) provides translation invariance
- 100× fewer parameters than equivalent MLP
- Training completes in ~105 seconds (50 epochs)
- Sample predictions table shows 9/10 correct

Module 09 Spatial Improvements:
- Removed ugly try/except import pattern
- Clean imports: 'from tinytorch.core.tensor import Tensor'
- Matches PyTorch style (simple and professional)
- No fallback logic needed

All 4 milestones now follow consistent 5-Act structure!
2025-09-30 17:04:41 -04:00
Vijay Janapa Reddi
6af994a82f test: Add comprehensive CNN integration tests
Created test_cnn_integration.py with:

 Conv2d Operations Tests:
- Verifies actual convolution (not just shape manipulation)
- Edge detector test proves Conv2d computes correctly
- Shape transformations for various configurations
- Parameter count verification (448 params for 3→16, k=3)

 Pooling Operations Tests:
- MaxPool2d actually computes maximum values
- AvgPool2d actually computes averages
- Shape transformations validated
- Handles negative values correctly

 Numerical Stability Tests:
- Zero inputs handled correctly
- Negative values in pooling work properly

⚠️  Gradient Flow Tests (TODO):
- Placeholder for Conv2d backward support
- Will add when Conv2d autograd integration is implemented

All forward pass tests passing (8/8)!
These tests ensure CNNs actually work, not just shape shuffle.
2025-09-30 16:57:14 -04:00
Vijay Janapa Reddi
cf575b4829 fix: Update Module 09 spatial for standalone classes
Changes:
- Removed broken _SimplifiedTensor and internal Module helper classes
- Updated imports to use tinytorch.core instead of dev modules
- Removed Module inheritance from Conv2d, MaxPool2d, AvgPool2d, SimpleCNN
- All spatial classes now standalone like Linear in layers module

This allows spatial module to export cleanly and import correctly:
  from tinytorch.core.spatial import Conv2d, MaxPool2d, AvgPool2d

Smoke test: Conv2d(1,3,8,8) → (1,16,6,6) ✓
2025-09-30 16:54:21 -04:00
Vijay Janapa Reddi
a62e696900 refactor: Display training times in milliseconds for better resolution
Training on 8x8 digits is so fast (< 1 second) that showing
seconds rounded to 1 decimal doesn't provide meaningful resolution.
Changed to milliseconds (ms) to show actual time differences between
batch sizes.

Now shows: '147ms' instead of '0.1s'
2025-09-30 16:48:26 -04:00
Vijay Janapa Reddi
a9517f51ae fix: Use len(train_dataset) instead of train_dataset.features
TensorDataset implements __len__ but doesn't expose a 'features' attribute.
Fixed throughput calculation in batch size comparison experiment.
2025-09-30 16:46:57 -04:00
Vijay Janapa Reddi
0f8e55ae87 refactor: Apply 5-Act narrative structure to Milestone 03 + Fix duplicates
Milestone 03 Updates:
- Full 5-Act narrative structure implemented
- ACT 1: Challenge with data description
- ACT 2: Setup with architecture + hyperparameters
- ACT 3: Experiment with training progress
- ACT 4: Diagnosis with results + insights
- ACT 5: Reflection with internal separators (━)
- Horizontal separators (─) between all acts

Fixes Across All Milestones:
- Removed duplicate 'Training Complete' print in Milestone 02
- Standardized table column widths across all 3 milestones:
  * Metric: 18
  * Before Training: 16
  * After Training: 16
  * Improvement: 14
- Consistent table title: 'Training Outcome'
- All final panels now have internal separators

All milestones now follow identical 5-Act structure with:
- Clear visual flow with horizontal rules
- Consistent emoji usage
- Same panel styles and widths
- Beautiful internal separators in celebration panels
2025-09-30 16:45:28 -04:00
Vijay Janapa Reddi
0eeb626730 refactor: Apply 5-Act narrative structure to Milestone 02
Implemented complete 5-Act flow for XOR solution:

ACT 1: THE CHALLENGE 🎯
- Problem: Can networks solve non-linearly separable problems?
- XOR dataset with pattern explanation
- Challenge: NOT linearly separable
- Horizontal separator

ACT 2: THE SETUP 🏗️
- Architecture with hidden layer emphasis
- Components: hidden layer transforms space
- Hyperparameters including aggressive LR
- Horizontal separator

ACT 3: THE EXPERIMENT 🔬
- Before: impossible for single-layer
- Training with hidden layers
- Completion message
- Horizontal separator

ACT 4: THE DIAGNOSIS 📊
- Results table (Training Outcome)
- XOR truth table with all 4 cases
- Key insights about hidden layers
- Horizontal separator

ACT 5: THE REFLECTION 🌟
- Final panel with internal separators (━)
- Accomplishments (✓ bullets)
- Historical timeline (1969→1986→TODAY)
- Key insight about hidden layers
- Breakthrough explanation
- Preview of Milestone 03

Consistent with Milestone 01 structure!
2025-09-30 16:42:37 -04:00
Vijay Janapa Reddi
a7412f3070 refactor: Apply 5-Act narrative structure to Milestone 01
Implemented complete 5-Act flow:

ACT 1: THE CHALLENGE 🎯
- Opening panel with problem statement
- Data description
- Horizontal separator

ACT 2: THE SETUP 🏗️
- Architecture diagram
- Components breakdown
- Hyperparameters
- Horizontal separator

ACT 3: THE EXPERIMENT 🔬
- Before training baseline
- Training progress with live updates
- Completion message
- Horizontal separator

ACT 4: THE DIAGNOSIS 📊
- Results table (Training Outcome)
- Sample predictions with context
- Key insights bullet points
- Horizontal separator

ACT 5: THE REFLECTION 🌟
- Celebration panel with internal separators
- What you accomplished (✓ bullets)
- Why this matters (historical/technical)
- Key insight + limitation
- What's next (preview)

Visual improvements:
- Horizontal rules (─) between acts
- Better emoji usage (📊 🏗️ 🔬 📌 💡 🔍)
- Internal separators (━) in final panel
- Consistent [dim] hints throughout
2025-09-30 16:40:06 -04:00
Vijay Janapa Reddi
05fcee5b9a fix: Add missing box import and remove duplicate prints in Milestone 02
- Added 'from rich import box' import
- Removed duplicate Step 1 prints from generate_xor_data()
- Created MILESTONE_NARRATIVE_FLOW.md with 5-Act structure

New structure creates clear narrative flow:
- Act 1: The Challenge (problem + data)
- Act 2: The Setup (architecture + hyperparams)
- Act 3: The Experiment (training)
- Act 4: The Diagnosis (results + insights)
- Act 5: The Reflection (accomplishment + meaning)

Visual separators between acts for clarity.
2025-09-30 16:37:57 -04:00
Vijay Janapa Reddi
cd3a326067 feat: Apply consistent structure template to Milestones 01 & 02
Updated both milestones to match the standard 7-part template:

1. OPENING - Historical Context
   - Cyan box.DOUBLE panel
   - Year + context in title
   - What they'll build

2. ARCHITECTURE - Visual Understanding
   - ASCII diagram
   - Component breakdown
   - Parameter count

3. STEPS - Numbered Training Process
   - Consistent [bold yellow]Step N:[/bold yellow] format
   - Clear progression
   - Status updates

4. RESULTS TABLE - Before/After Comparison
   - Title: 🎯 Training Results
   - box.ROUNDED style
   - Consistent colors (yellow/green/magenta)

5. SAMPLE PREDICTIONS - Real Outputs
   - 10 sample predictions
   - ✓/✗ with color coding

6. CELEBRATION - Victory!
   - Already standardized
   - Green box.DOUBLE panel

Benefits:
- Students now experience consistent flow
- Clear progression across milestones
- Familiar structure reduces cognitive load
- 'Wow, I'm improving!' experience

Milestone 03 already matches this template!
2025-09-30 16:34:27 -04:00
Vijay Janapa Reddi
42171d8820 feat: Add batch size experiment to Milestone 03 + Create milestone structure guide
Added batch size comparison experiment:
- Optional experiment after main training
- Compares batch sizes: 16, 64, 256
- Shows DataLoader impact on training
- Demonstrates throughput vs update frequency trade-off
- Beautiful comparison table and insights panel

Created MILESTONE_STRUCTURE_GUIDE.md:
- Defines consistent structure across all milestones
- 7-part template: Opening → Architecture → Steps → Results → Predictions → Celebration
- Consistent colors, emojis, box styles
- Ensures students experience progression and familiarity
- Template for future milestones

This creates a cohesive learning journey!
2025-09-30 16:31:21 -04:00
Vijay Janapa Reddi
0c322e5c37 feat: Standardize milestone ending panels across all milestones
All milestones now have consistent celebratory summary panels:
- Same box.DOUBLE style with green border
- '🎉 Success! ...' format
- '💡 What YOU Just Accomplished' section
- Historical context and significance
- '📌 Note' with technical insight
- Preview of next milestone

Updates:
- Milestone 01: Single comprehensive panel
- Milestone 02: Combined into one panel
- Milestone 03: Already had perfect format (template)

This creates a consistent, celebratory learning experience!
2025-09-30 16:28:48 -04:00
Vijay Janapa Reddi
828c3d9081 feat: Add CrossEntropyLoss autograd support + Milestone 03 MLP on digits
Key Changes:
- Implemented CrossEntropyBackward for gradient computation
- Integrated CrossEntropyLoss into enable_autograd() patching
- Created comprehensive loss gradient test suite
- Milestone 03: MLP digits classifier (77.5% accuracy)
- Shipped tiny 8x8 digits dataset (67KB) for instant demos
- Updated DataLoader module with ASCII visualizations

Tests:
- All 3 losses (MSE, BCE, CrossEntropy) now have gradient flow
- MLP successfully learns digit classification (6.9% → 77.5%)
- Integration tests pass

Technical:
- CrossEntropyBackward: softmax - one_hot gradient
- Numerically stable via log-softmax
- Works with raw class labels (no one-hot needed)
2025-09-30 16:22:09 -04:00
Vijay Janapa Reddi
5d6f17aa27 Fix DataLoader integration tests to work before export
Added fallback import logic:
- Try importing from tinytorch package first
- Fall back to dev modules if not exported yet
- Works both before and after 'tito export 08_dataloader'

All 3 integration tests pass:
 Training workflow integration
 Shuffle consistency across epochs
 Memory efficiency verification
2025-09-30 16:08:21 -04:00
Vijay Janapa Reddi
3830e4bfc3 Finalize Module 08 and add integration tests
Added integration tests for DataLoader:
- test_dataloader_integration.py in tests/integration/
  - Training workflow integration
  - Shuffle consistency across epochs
  - Memory efficiency verification

Updated Module 08:
- Added note about optional performance analysis
- Clarified that analysis functions can be run manually
- Clean flow: text → code → tests

Updated datasets/tiny/README.md:
- Minor formatting fixes

Module 08 is now complete and ready to export:
 Dataset abstraction
 TensorDataset implementation
 DataLoader with batching/shuffling
 ASCII visualizations for understanding
 Unit tests (in module)
 Integration tests (in tests/)
 Performance analysis tools (optional)

Next: Export with 'bin/tito export 08_dataloader'
2025-09-30 16:07:55 -04:00
Vijay Janapa Reddi
683615d04f Clean up Module 08: Remove unconditional function calls
Fixed issue where performance analysis functions were called every time
the module was imported, instead of only when needed.

Changes:
- Commented out analyze_dataloader_performance() bare call
- Commented out analyze_memory_usage() bare call
- Removed redundant test_training_integration() comment

These functions are still defined and can be called manually for
performance insights, but won't run on every import.

The test_module() function still calls all necessary tests when
the module is run as __main__.

Result: Module imports cleanly without running expensive performance
benchmarks unless explicitly requested.
2025-09-30 15:26:00 -04:00
Vijay Janapa Reddi
b6f4a0bee6 Add ASCII visualizations to Module 08 for understanding image data
Added educational ASCII art showing:

1. **Actual pixel values** - What 8×8 digit images look like as numbers
   - Shows digits 5, 3, and 8 with real pixel values (0-16 range)
   - Helps students understand images are just 2D arrays

2. **Visual representation** - How humans see the digits
   - ASCII art showing recognizable digit shapes
   - Connects abstract numbers to concrete patterns

3. **Shape transformations** - How DataLoader batches data
   - Individual: (8, 8) → Batched: (32, 8, 8)
   - Shows what the model actually receives

4. **Complete example** - Loading and using tiny digits dataset
   - Real code showing datasets/tiny/digits_8x8.npz usage
   - Demonstrates the full DataLoader workflow

Benefits:
 Students visualize what image data IS
 Understand DataLoader's batching transformation
 See connection between numbers and visual patterns
 Ready to work with real datasets in milestones

This makes the abstract concept of 'image tensors' concrete and visual.
2025-09-30 15:22:30 -04:00
Vijay Janapa Reddi
38b089b52f Simplify Module 08: Focus on DataLoader mechanics, not dataset downloads
Removed synthetic download functions (download_mnist, download_cifar10):
- These were placeholder stubs generating random noise
- Conflicted with 'Real Data, Real Systems' philosophy
- Added scope creep (dataset management vs data loading)

Module 08 now focuses purely on:
 Dataset abstraction (interface design)
 TensorDataset implementation (in-memory wrapper)
 DataLoader mechanics (batching, shuffling, iteration)

Real datasets handled in examples/milestones:
- datasets/tiny/digits_8x8.npz ships with repo (instant)
- Milestone 03: MNIST download + training
- Milestone 04: CIFAR-10 download + CNN training

Separation of concerns:
- Module 08: Learn DataLoader abstraction (synthetic test data)
- Examples: Apply DataLoader to real data (actual datasets)

This follows PyTorch's pattern:
- torch.utils.data.DataLoader (abstraction)
- torchvision.datasets (actual data)

Tests still pass 100% with simplified synthetic data.
2025-09-30 15:10:08 -04:00
Vijay Janapa Reddi
92b70c3646 Add tiny datasets infrastructure with 8×8 digits
Created datasets/tiny/ for shipping small datasets with TinyTorch:

New Structure:
- datasets/tiny/digits_8x8.npz (67KB, 1,797 samples)
  - 8×8 handwritten digits from UCI/sklearn
  - Normalized to [0-1], ready for immediate use
  - Perfect for DataLoader learning (Module 08)

- datasets/tiny/README.md
  - Full documentation and usage examples
  - Philosophy: tiny (learn) → full (practice) → custom (master)

- datasets/tiny/create_digits_8x8.py
  - Extraction script showing how dataset was created
  - Reproducible from sklearn.datasets.load_digits()

Updated .gitignore:
- Ignore datasets/* (downloaded large files)
- Allow datasets/tiny/ (shipped small files)
- Allow datasets/README.md and download scripts
- Selectively ignore .npz files (not in tiny/)

Benefits:
 Zero download friction for Module 08
 Offline-friendly (planes, classrooms, slow networks)
 Real handwritten digits (not synthetic noise)
 Git-friendly size (67KB vs 10MB MNIST)
 Same shape/format students will use for CNNs

Progression:
- Module 08: Learn DataLoader with 8×8 digits
- Milestone 03: Train on full 28×28 MNIST
- Milestone 04: Scale to CIFAR-10
2025-09-30 15:05:34 -04:00
Vijay Janapa Reddi
228b4f25ea Move test_xor_original_1986.py to tests/integration/ 2025-09-30 14:16:58 -04:00
Vijay Janapa Reddi
82fd89d5b3 Remove unnecessary matplotlib import from losses module
Issue: xor_crisis.py was failing with ImportError on matplotlib architecture mismatch
Root cause: losses_dev.py imported matplotlib.pyplot but never used it

Fix:
-  Removed unused imports: matplotlib.pyplot, time
-  Re-exported module 04_losses to update tinytorch package
-  Verified both milestone 02 scripts now run successfully

The matplotlib import was causing failures on M2 Macs where matplotlib
was installed for wrong architecture (x86_64 vs arm64). Since it was
never used, removing it eliminates the dependency entirely.

Tested:
-  milestones/02_xor_crisis_1969/xor_crisis.py (49% accuracy - expected failure)
-  milestones/02_xor_crisis_1969/xor_solved.py (100% accuracy - perfect!)
2025-09-30 14:16:42 -04:00
Vijay Janapa Reddi
5066d91877 Clean up milestone 02 to match milestone 01 structure
Milestone 02 Structure (matches milestone 01):
- README.md: Comprehensive guide with historical context
- xor_crisis.py: Part 1 - demonstrates single-layer failure (executable)
- xor_solved.py: Part 2 - demonstrates multi-layer success (executable)

Cleanup:
-  Removed old perceptron_xor_fails.py
-  Moved test files to tests/integration/
  - test_xor_simple.py
  - test_xor_thorough.py
  - test_xor_original_1986.py (verifies 2-2-1 architecture works!)
-  Updated README with clear instructions
-  Made scripts executable

Milestone 02 now has the same polish and structure as milestone 01:
- Clear file naming (crisis vs solved)
- Beautiful rich output
- Historical context
- Pedagogically structured
2025-09-30 14:14:37 -04:00
Vijay Janapa Reddi
82e0ed55e7 Add XOR verification tests - confirm 100% accuracy achievable
Tests prove multi-layer networks work perfectly:
- test_xor_simple.py: Quick test (100% in 500 epochs)
- test_xor_thorough.py: Comprehensive test with multiple LRs

Results with optimal hyperparameters:
 100% accuracy on all 4 XOR cases
 Loss: 0.0015 (near perfect)
 Perfect predictions: (0,0)→0.005, (0,1)→1.000, (1,0)→1.000, (1,1)→0.000

This confirms:
- Multi-layer backprop works correctly
- ReLU gradients flow properly
- Hidden layers learn non-linear decision boundaries
- Autograd system is solid!

The milestone scripts show 75% because they use conservative
hyperparameters for pedagogical reasons (to show learning process).
These tests prove the architecture can achieve perfection.
2025-09-30 14:11:25 -04:00
Vijay Janapa Reddi
d032e4278b Add ReLUBackward and complete XOR milestone scripts
New Features:
- Add ReLUBackward for proper ReLU gradient computation
- Patch ReLU.forward() in enable_autograd() for gradient tracking
- Create polished XOR milestone scripts matching perceptron style

XOR Milestone Scripts (milestones/02_xor_crisis_1969/):
- xor_crisis.py: Shows single-layer perceptron FAILING (~50% accuracy)
- xor_solved.py: Shows multi-layer network SUCCEEDING (75%+ accuracy)
- Beautiful rich output with tables, panels, historical context
- Pedagogically structured like the perceptron milestone

Results:
 Single-layer: Stuck at ~50% (proves the crisis)
 Multi-layer: 75% accuracy (proves hidden layers work!)
 ReLU gradients flow correctly through network
 All 4 core activations now support autograd:
   - Sigmoid ✓, ReLU ✓, Tanh ✓ (future), GELU ✓ (future)

Historical Significance:
This recreates the exact problem that killed AI for 17 years
and demonstrates the solution that started the modern era!
2025-09-30 14:10:11 -04:00
Vijay Janapa Reddi
9a23f544fd Solve XOR problem - multi-layer networks work!
Add test_xor_simple.py - validates multi-layer gradient flow
- 100% accuracy on XOR (the 1969 'impossible' problem)
- Hidden layer (2→4) + ReLU + output (4→1) architecture
- Gradients flow correctly through 2 layers
- Loss decreases smoothly during training

This proves:
 Multi-layer networks work
 Backprop works through hidden layers
 ReLU activation works in training
 The 1969 AI Winter problem is solved!

Historical significance: Minsky proved single-layer perceptrons
couldn't solve XOR. Multi-layer networks (what we built) can!
2025-09-30 14:05:13 -04:00
Vijay Janapa Reddi
9129935d5b Add MSEBackward and organize comprehensive test suite
New Features:
- Add MSEBackward gradient computation for regression tasks
- Patch MSELoss in enable_autograd() for gradient tracking
- All 3 loss functions now support autograd: MSE, BCE, CrossEntropy

Test Suite Organization:
- Reorganize tests/ into focused directories
- Create tests/integration/ for cross-module tests
- Create tests/05_autograd/ for autograd edge cases
- Create tests/debugging/ for common student pitfalls
- Add comprehensive tests/README.md explaining test philosophy

Integration Tests:
- Move test_gradient_flow.py to integration/
- 20 comprehensive gradient flow tests
- Tests cover: tensors, layers, activations, losses, optimizers
- Tests validate: basic ops, chain rule, broadcasting, training loops
- 19/20 tests passing (MSE now fixed!)

Results:
 Perceptron learns: 50% → 93% accuracy
 Clean test organization guides future development
 Tests catch the exact bugs that broke training

Pedagogical Value:
- Test organization teaches testing best practices
- Gradient flow tests show what integration testing catches
- Sets foundation for debugging/diagnostic tests
2025-09-30 13:57:40 -04:00
Vijay Janapa Reddi
dc61a1b041 Clean up gradient broadcasting logic - more pedagogical
Refactored gradient accumulation to use clearer two-step approach:
1. Remove extra leading dimensions (batch dims)
2. Sum over dimensions that were size-1 (broadcast dims)

Benefits:
- Clearer intent: while loop for variable dims, for loop for fixed dims
- Better comments with concrete examples
- Easier for students to understand broadcasting in backprop
- Matches how you'd explain it verbally

Same functionality, cleaner code.
2025-09-30 13:53:05 -04:00
Vijay Janapa Reddi
49ea4d6839 Fix gradient propagation: enable autograd and patch activations/losses
CRITICAL FIX: Gradients now flow through entire training stack!

Changes:
1. Enable autograd in __init__.py - patches Tensor operations on import
2. Extend enable_autograd() to patch Sigmoid and BCE forward methods
3. Fix gradient accumulation to handle broadcasting (bias gradients)
4. Fix optimizer.step() - param.grad is numpy array, not Tensor.data
5. Add debug_gradients.py for systematic gradient flow testing

Architecture:
- Clean patching pattern - all gradient tracking in enable_autograd()
- Activations/losses remain simple (Module 02/04)
- Autograd (Module 05) upgrades them with gradient tracking
- Pedagogically sound: separation of concerns

Results:
 All 6 debug tests pass
 Perceptron learns: 50% → 93% accuracy
 Loss decreases: 0.79 → 0.36
 Weights update correctly through SGD
2025-09-30 13:51:30 -04:00
Vijay Janapa Reddi
af1c313d16 Reset package and export modules 01-07 only (skip broken spatial module) 2025-09-30 13:41:00 -04:00
Vijay Janapa Reddi
5184fa350b Update autograd module with latest changes 2025-09-30 13:40:51 -04:00
Vijay Janapa Reddi
d1439a0db1 Fix imports: Replace dev-style imports with proper package imports in modules 06-07 2025-09-30 13:40:38 -04:00
Vijay Janapa Reddi
eeb308a691 WIP: Manual edits to tinytorch (WRONG APPROACH - needs revert)
WARNING: I incorrectly edited files in tinytorch/ directly:
- tinytorch/core/autograd.py - added enable_autograd() manually
- tinytorch/core/activations.py - tried to add gradient tracking
- tinytorch/core/losses.py - restored from git

CORRECT APPROACH:
1. Make ALL changes in modules/source/XX_*/YY_dev.py
2. Add #| export directives for classes to export
3. Run: tito export XX_module
4. NEVER edit tinytorch/ files directly

Next steps:
- Revert tinytorch/ manual edits
- Add proper exports to source modules
- Export cleanly
2025-09-30 13:31:31 -04:00
Vijay Janapa Reddi
7fbd72deae Use clean top-level imports from tinytorch
- Updated tinytorch/__init__.py to export all common components at top level
- Changed milestone imports from 'tinytorch.core.*' to 'tinytorch'
- Students now use: from tinytorch import Tensor, Linear, Sigmoid, SGD
- Cleaner API that respects module boundaries
- Added enable_autograd() that enhances operations without modifying source modules

STILL TODO: Fix gradient flow - training not learning yet
2025-09-30 13:29:22 -04:00
Vijay Janapa Reddi
0015a8cab1 WIP: Add SigmoidBackward and BCEBackward classes to autograd
Added:
- SigmoidBackward class to modules/source/05_autograd/autograd_dev.py with #| export
- BCEBackward class to modules/source/05_autograd/autograd_dev.py with #| export
- Both classes exported to tinytorch/core/autograd.py
- Updated Sigmoid activation to track gradients using SigmoidBackward
- Updated BCE loss to track gradients using BCEBackward

ISSUE: Training still not learning - gradients not flowing properly
- Loss stays constant at 0.7911
- Weights don't update
- Sigmoid.forward() code looks correct but a.requires_grad stays False
- Need to investigate why gradient tracking isn't working through activations
2025-09-30 13:23:56 -04:00
Vijay Janapa Reddi
99a39ea1f8 Add milestone training examples and fix optimizers
- Created perceptron_trained.py milestone with full training loop
- Restored tinytorch/core/optimizers.py with Optimizer, SGD, Adam, AdamW classes
- Fixed imports to use tinytorch.core.* instead of tensor_dev
- Fixed tinytorch/core/losses.py with all loss functions
- Fixed tinytorch/core/training.py imports

ISSUE: Training loop runs but doesn't learn (gradients not flowing)
- Loss stays constant at 0.7911
- Weights don't update
- Likely autograd (Module 05) backward() not fully implemented
- Need to fix Tensor.backward() and gradient computation
2025-09-30 13:07:53 -04:00
Vijay Janapa Reddi
103a172b0d Fix: Add __call__ methods to exported package files
Manually added __call__ methods to tinytorch/core/ exported files:
- activations.py: ReLU, Tanh, GELU, Softmax
- layers.py: Dropout

These were added to source files earlier but nbdev_export is blocked by
an indentation error in one of the notebooks. Manually applying fixes
to the exported package allows tests to pass while we fix the export issue.

Test improvements:
- 02_activations: 20% → 92% (+72%!) 🎉
- 03_layers: 41% → 46% (+5%)
- 04_losses: 44% → 48% (+4%)
- Overall: 50.5% → 61.7% (+11%)

Still need to:
1. Fix nbdev_export indentation error
2. Investigate 06_optimizers (0% pass rate)
3. Add __call__ to loss classes when export is fixed
2025-09-30 12:49:31 -04:00
Vijay Janapa Reddi
e060f002b0 Add comprehensive test runner for training milestone (modules 01-07)
Created run_training_milestone_tests.py to systematically test all modules
needed for the training milestone:
- 01_tensor, 02_activations, 03_layers, 04_losses
- 05_autograd, 06_optimizers, 07_training

Features:
- Runs all module tests in sequence
- Parses results and provides summary table
- Shows pass rates and overall readiness
- Identifies which modules need attention
- Uses Rich library for beautiful output

Current results: 50.5% passing (95/188 tests)
Expected after re-export: ~85% (need to update tinytorch package with __call__ methods)

Usage:
  cd tests && python run_training_milestone_tests.py
2025-09-30 12:43:51 -04:00
Vijay Janapa Reddi
302cbea5ff Add exported package files and cleanup
This commit includes:
- Exported tinytorch package files from nbdev (autograd, losses, optimizers, training, etc.)
- Updated activations.py and layers.py with __call__ methods
- New module exports: attention, spatial, tokenization, transformer, etc.
- Removed old _modidx.py file
- Cleanup of duplicate milestone directories

These are the generated package files that correspond to the source modules
we've been developing. Students will import from these when using TinyTorch.
2025-09-30 12:38:56 -04:00
Vijay Janapa Reddi
76da686ce0 Update loss function examples to use PyTorch-style callable API
Updated docstring examples to use cleaner callable syntax:
- loss_fn(predictions, targets) instead of loss_fn.forward(predictions, targets)

Applied to:
- MSELoss
- CrossEntropyLoss
- BinaryCrossEntropyLoss

Demonstrates proper usage with __call__ methods for cleaner, more Pythonic code.
2025-09-30 12:36:27 -04:00
Vijay Janapa Reddi
fd6f377b77 Update activation examples to use PyTorch-style callable API
Updated docstring examples to use cleaner callable syntax:
- sigmoid(x) instead of sigmoid.forward(x)
- relu(x) instead of relu.forward(x)
- tanh(x) instead of tanh.forward(x)
- gelu(x) instead of gelu.forward(x)
- softmax(x) instead of softmax.forward(x)

This demonstrates the proper usage pattern with the __call__ methods
we just added, making examples more Pythonic and PyTorch-compatible.
2025-09-30 12:36:00 -04:00
Vijay Janapa Reddi
17cb8049c6 Add __call__ methods to enable PyTorch-style API
Enable cleaner API usage by adding __call__ methods to all activation,
layer, and loss classes. This allows students to write:
  - relu(x) instead of relu.forward(x)
  - layer(x) instead of layer.forward(x)
  - loss_fn(pred, target) instead of loss_fn.forward(pred, target)

Changes:
- Module 02 (Activations): Add __call__ to ReLU, Tanh, GELU, Softmax
  * Sigmoid already had __call__
- Module 03 (Layers): Add __call__ to Dropout
  * Linear already had __call__
- Module 04 (Losses): Add __call__ to MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss

This matches PyTorch's API convention where model(x) calls model.__call__(x)
which internally calls model.forward(x). Makes code more Pythonic and
intuitive for students familiar with PyTorch.

Expected impact: Test pass rates should improve significantly as tests
expect PyTorch-style callable API.
2025-09-30 12:33:45 -04:00
Vijay Janapa Reddi
231bd4344e Rename test directories to match source module names exactly
- module_01 → 01_tensor
- module_02 → 02_activations
- module_03 → 03_layers
- module_04 → 04_losses
- module_05 → 05_autograd
- module_06 → 06_optimizers
- module_07 → 07_training
- module_08 → 08_dataloader
- module_09 → 09_spatial
- module_10 → 10_tokenization
- module_11 → 11_embeddings
- module_12 → 12_attention
- module_13 → 13_transformers
- module_14 → 14_kvcaching
- module_15 → 15_profiling

This prevents misalignment between source and test directories.
Tests now mirror the exact structure of modules/source/.
2025-09-30 12:24:48 -04:00
Vijay Janapa Reddi
2c5d89ede7 Reorganize test directories to align with source modules
- Delete tests/module_01/ (Setup tests - no longer needed)
- Rename all test directories: module_02→01, module_03→02, etc.
- Update all internal references to match new numbering
- Tests now align perfectly with source modules:
  * module_01 = Tensor (01_tensor)
  * module_02 = Activations (02_activations)
  * module_03 = Layers (03_layers)
  * etc.

All tests import from tinytorch.* package, not from modules/source/ directly.
Test results: module_01: 31/34 pass, module_02: 5/25 pass, module_03: 15/37 pass
2025-09-30 12:23:15 -04:00
Vijay Janapa Reddi
32aabfa78c Refactor Milestone 1: Clean forward pass with Rich CLI
- Reorganized milestone structure to historical progression (01-06)
- Created single forward_pass.py with student code clearly at top
- Added Rich CLI visualizations: data scatter, network diagram, decision boundary
- Show decision boundary using / or \ based on slope
- No random seed - students see variability in random weights
- Annotated all code with which modules were used (Modules 01-03)
- Added introductory panel explaining what to expect
- Updated DEFINITIVE_MODULE_PLAN.md with corrected milestone structure
2025-09-30 12:03:19 -04:00
Vijay Janapa Reddi
de3b837bee Fix nbdev export system across all 20 modules
PROBLEM:
- nbdev requires #| export directive on EACH cell to export when using # %% markers
- Cell markers inside class definitions split classes across multiple cells
- Only partial classes were being exported to tinytorch package
- Missing matmul, arithmetic operations, and activation classes in exports

SOLUTION:
1. Removed # %% cell markers INSIDE class definitions (kept classes as single units)
2. Added #| export to imports cell at top of each module
3. Added #| export before each exportable class definition in all 20 modules
4. Added __call__ method to Sigmoid for functional usage
5. Fixed numpy import (moved to module level from __init__)

MODULES FIXED:
- 01_tensor: Tensor class with all operations (matmul, arithmetic, shape ops)
- 02_activations: Sigmoid, ReLU, Tanh, GELU, Softmax classes
- 03_layers: Linear, Dropout classes
- 04_losses: MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss classes
- 05_autograd: Function, AddBackward, MulBackward, MatmulBackward, SumBackward
- 06_optimizers: Optimizer, SGD, Adam, AdamW classes
- 07_training: CosineSchedule, Trainer classes
- 08_dataloader: Dataset, TensorDataset, DataLoader classes
- 09_spatial: Conv2d, MaxPool2d, AvgPool2d, SimpleCNN classes
- 10-20: All exportable classes in remaining modules

TESTING:
- Test functions use 'if __name__ == "__main__"' guards
- Tests run in notebooks but NOT on import
- Rosenblatt Perceptron milestone working perfectly

RESULT:
 All 20 modules export correctly
 Perceptron (1957) milestone functional
 Clean separation: development (modules/source) vs package (tinytorch)
2025-09-30 11:21:04 -04:00