TinyTorch

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-07-16 09:08:29 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	91ac8458cd	Add validation tool: NBGrader config validator - Add comprehensive NBGrader configuration validator - Validates Jupytext headers, solution blocks, cell metadata - Checks for duplicate grade IDs and proper schema version - Provides detailed validation reports with severity levels	2025-11-11 19:04:58 -05:00
Vijay Janapa Reddi	9a0924376e	Cleanup: Remove old/unused files - Remove datasets analysis and download scripts (replaced by updated README) - Remove archived book development documentation - Remove module review reports (16_compression, 17_memoization)	2025-11-11 19:04:56 -05:00
Vijay Janapa Reddi	af12404076	Increase TinyDigits to 1000 samples following Karpathy's philosophy You were right - 150 samples was too small for decent accuracy. Following Andrej Karpathy's "~1000 samples" educational dataset philosophy. Results: - Before (150 samples): 19% test accuracy (too small!) - After (1000 samples): 79.5% test accuracy (decent!) Changes: - Increased training: 150 → 1000 samples (100 per digit class) - Increased test: 47 → 200 samples (20 per digit class) - Perfect class balance: 0.00 std deviation - File size: 51 KB → 310 KB (still tiny for USB stick) - Training time: ~3-5 sec → ~8-10 sec (still fast) Updated: - create_tinydigits.py: Load from sklearn, generate 1K samples - train.pkl: 258 KB (1000 samples, perfectly balanced) - test.pkl: 52 KB (200 samples, balanced) - README.md: Updated all documentation with new sizes - mlp_digits.py: Updated docstring to reflect 1K dataset Dataset Philosophy: "~1000 samples is the sweet spot for educational datasets" - Small enough: Trains in seconds on CPU - Large enough: Achieves decent accuracy (~80%) - Balanced: Perfect stratification across all classes - Reproducible: Fixed seed=42 for consistency Still perfect for TinyTorch-on-a-stick vision: - 310 KB fits on any USB drive - Works on RasPi0 - No downloads needed - Offline-first education	2025-11-10 17:20:54 -05:00
Vijay Janapa Reddi	84568f0bd5	Create TinyDigits educational dataset for self-contained TinyTorch Replaces sklearn-sourced digits_8x8.npz with TinyTorch-branded dataset. Changes: - Created datasets/tinydigits/ (~51KB total) - train.pkl: 150 samples (15 per digit class 0-9) - test.pkl: 47 samples (balanced across digits) - README.md: Full curation documentation - LICENSE: BSD 3-Clause with sklearn attribution - create_tinydigits.py: Reproducible generation script - Updated milestones to use TinyDigits: - mlp_digits.py: Now loads from datasets/tinydigits/ - cnn_digits.py: Now loads from datasets/tinydigits/ - Removed old data: - datasets/tiny/ (67KB sklearn duplicate) - milestones/03_1986_mlp/data/ (67KB old location) Dataset Strategy: TinyTorch now ships with only 2 curated datasets: 1. TinyDigits (51KB) - 8x8 digits for MLP/CNN milestones 2. TinyTalks (140KB) - Q&A pairs for transformer milestone Total: 191KB shipped data (perfect for RasPi0 deployment) Rationale: - Self-contained: No downloads, works offline - Citable: TinyTorch educational infrastructure for white paper - Portable: Tiny footprint enables edge device deployment - Fast: <5 sec training enables instant student feedback Updated .gitignore to allow TinyTorch curated datasets while still blocking downloaded large datasets.	2025-11-10 16:59:43 -05:00
Vijay Janapa Reddi	4b717b3d82	Update release documentation and advanced modules - Updated release checklist and December 2024 release notes - Updated student version tooling documentation - Modified modules 15-19 (memoization, quantization, compression, benchmarking) - Added milestone dashboard and progress tracking - Added compliance reports and module audits - Added checkpoint tests for modules 15-20 - Added activation script and book configuration	2025-11-09 16:51:55 -05:00
Vijay Janapa Reddi	21fe57df66	feat(datasets): Add TinyTalks v1.0 - Educational Q&A dataset for transformer training - 301 Q&A pairs across 5 progressive difficulty levels - 17.5 KB total size, optimized for 3-5 minute training - Includes train/val/test splits (70/15/15) - Professional documentation (README, DATASHEET, CHANGELOG, SUMMARY) - Validation and statistics scripts - Licensed under CC BY 4.0 Dataset designed specifically for TinyTorch Module 13 (Transformers) to provide immediate learning feedback for students training their first transformer model.	2025-10-28 12:15:04 -04:00
Vijay Janapa Reddi	97fece7b5f	Finalize Module 08 and add integration tests Added integration tests for DataLoader: - test_dataloader_integration.py in tests/integration/ - Training workflow integration - Shuffle consistency across epochs - Memory efficiency verification Updated Module 08: - Added note about optional performance analysis - Clarified that analysis functions can be run manually - Clean flow: text → code → tests Updated datasets/tiny/README.md: - Minor formatting fixes Module 08 is now complete and ready to export: ✅ Dataset abstraction ✅ TensorDataset implementation ✅ DataLoader with batching/shuffling ✅ ASCII visualizations for understanding ✅ Unit tests (in module) ✅ Integration tests (in tests/) ✅ Performance analysis tools (optional) Next: Export with 'bin/tito export 08_dataloader'	2025-09-30 16:07:55 -04:00
Vijay Janapa Reddi	79f8fe38d0	Add tiny datasets infrastructure with 8×8 digits Created datasets/tiny/ for shipping small datasets with TinyTorch: New Structure: - datasets/tiny/digits_8x8.npz (67KB, 1,797 samples) - 8×8 handwritten digits from UCI/sklearn - Normalized to [0-1], ready for immediate use - Perfect for DataLoader learning (Module 08) - datasets/tiny/README.md - Full documentation and usage examples - Philosophy: tiny (learn) → full (practice) → custom (master) - datasets/tiny/create_digits_8x8.py - Extraction script showing how dataset was created - Reproducible from sklearn.datasets.load_digits() Updated .gitignore: - Ignore datasets/* (downloaded large files) - Allow datasets/tiny/ (shipped small files) - Allow datasets/README.md and download scripts - Selectively ignore .npz files (not in tiny/) Benefits: ✅ Zero download friction for Module 08 ✅ Offline-friendly (planes, classrooms, slow networks) ✅ Real handwritten digits (not synthetic noise) ✅ Git-friendly size (67KB vs 10MB MNIST) ✅ Same shape/format students will use for CNNs Progression: - Module 08: Learn DataLoader with 8×8 digits - Milestone 03: Train on full 28×28 MNIST - Milestone 04: Scale to CIFAR-10	2025-09-30 15:05:34 -04:00
Vijay Janapa Reddi	9103f83119	Add dataset download script and documentation - Created download_mnist.py script to fetch Fashion-MNIST dataset - Added README explaining dataset format and download process - Fashion-MNIST used as accessible alternative to original MNIST - Same format allows seamless use with existing examples	2025-09-29 10:56:49 -04:00

9 Commits