Added integration tests for DataLoader:
- test_dataloader_integration.py in tests/integration/
- Training workflow integration
- Shuffle consistency across epochs
- Memory efficiency verification
Updated Module 08:
- Added note about optional performance analysis
- Clarified that analysis functions can be run manually
- Clean flow: text → code → tests
Updated datasets/tiny/README.md:
- Minor formatting fixes
Module 08 is now complete and ready to export:
✅ Dataset abstraction
✅ TensorDataset implementation
✅ DataLoader with batching/shuffling
✅ ASCII visualizations for understanding
✅ Unit tests (in module)
✅ Integration tests (in tests/)
✅ Performance analysis tools (optional)
Next: Export with 'bin/tito export 08_dataloader'
Created datasets/tiny/ for shipping small datasets with TinyTorch:
New Structure:
- datasets/tiny/digits_8x8.npz (67KB, 1,797 samples)
- 8×8 handwritten digits from UCI/sklearn
- Normalized to [0-1], ready for immediate use
- Perfect for DataLoader learning (Module 08)
- datasets/tiny/README.md
- Full documentation and usage examples
- Philosophy: tiny (learn) → full (practice) → custom (master)
- datasets/tiny/create_digits_8x8.py
- Extraction script showing how dataset was created
- Reproducible from sklearn.datasets.load_digits()
Updated .gitignore:
- Ignore datasets/* (downloaded large files)
- Allow datasets/tiny/ (shipped small files)
- Allow datasets/README.md and download scripts
- Selectively ignore .npz files (not in tiny/)
Benefits:
✅ Zero download friction for Module 08
✅ Offline-friendly (planes, classrooms, slow networks)
✅ Real handwritten digits (not synthetic noise)
✅ Git-friendly size (67KB vs 10MB MNIST)
✅ Same shape/format students will use for CNNs
Progression:
- Module 08: Learn DataLoader with 8×8 digits
- Milestone 03: Train on full 28×28 MNIST
- Milestone 04: Scale to CIFAR-10