mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-28 21:12:46 -05:00
- 301 Q&A pairs across 5 progressive difficulty levels - 17.5 KB total size, optimized for 3-5 minute training - Includes train/val/test splits (70/15/15) - Professional documentation (README, DATASHEET, CHANGELOG, SUMMARY) - Validation and statistics scripts - Licensed under CC BY 4.0 Dataset designed specifically for TinyTorch Module 13 (Transformers) to provide immediate learning feedback for students training their first transformer model.
TinyTorch Datasets
This directory contains datasets for TinyTorch examples and training.
Directory Structure
datasets/
├── tiny/ ← Tiny datasets shipped with repo (~100KB each)
│ └── digits_8x8.npz (1,797 samples, 67KB)
├── mnist/ ← Full MNIST (downloaded, gitignored)
├── cifar10/ ← Full CIFAR-10 (downloaded, gitignored)
└── download_*.py ← Download scripts for large datasets
Quick Start
For learning (instant, offline):
# Use tiny shipped datasets
import numpy as np
data = np.load('datasets/tiny/digits_8x8.npz')
For serious training (download once):
python datasets/download_mnist.py
MNIST Dataset
The mnist/ directory should contain the MNIST or Fashion-MNIST dataset files:
train-images-idx3-ubyte.gz- Training images (60,000 samples)train-labels-idx1-ubyte.gz- Training labelst10k-images-idx3-ubyte.gz- Test images (10,000 samples)t10k-labels-idx1-ubyte.gz- Test labels
Downloading the Dataset
Run the provided download script:
cd datasets
python download_mnist.py
This will download Fashion-MNIST (which has the same format as MNIST but is more accessible).
Dataset Format
Both MNIST and Fashion-MNIST use the same IDX file format:
- Images: 28x28 grayscale pixels
- Labels: Integer values 0-9
- Gzipped for compression
Fashion-MNIST classes:
- 0: T-shirt/top
- 1: Trouser
- 2: Pullover
- 3: Dress
- 4: Coat
- 5: Sandal
- 6: Shirt
- 7: Sneaker
- 8: Bag
- 9: Ankle boot
The examples will work with either original MNIST digits or Fashion-MNIST items.