Files
TinyTorch/tinytorch/data/loader.py
Vijay Janapa Reddi 763cdd2bf2 Implement Tensor slicing with progressive disclosure and fix embedding gradient flow
WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles

MODULE 01 (Tensor):
- Added __getitem__ method for basic slicing operations
- Clean implementation with NO gradient mentions (progressive disclosure)
- Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1]
- Ensures scalar results are wrapped in arrays

MODULE 05 (Autograd):
- Added SliceBackward function for gradient computation
- Implements proper gradient scatter: zeros everywhere except sliced positions
- Added monkey-patching in enable_autograd() for __getitem__
- Follows same pattern as existing operations (add, mul, matmul)

MODULE 11 (Embeddings):
- Updated PositionalEncoding to use Tensor slicing instead of .data
- Fixed multiple .data accesses that broke computation graphs
- Removed Tensor() wrapping that created gradient-disconnected leafs
- Uses proper Tensor operations to preserve gradient flow

TESTING:
- All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training)
- 19/19 parameters get gradients (was 18/19 before)
- Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before)
- Model still not learning (0% accuracy) - needs fresh session to test monkey-patching

WHY THIS MATTERS:
- Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings
- Progressive disclosure maintains educational integrity
- Follows existing TinyTorch architecture patterns
- Enables position embeddings to potentially learn (pending verification)

DOCUMENTS CREATED:
- milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md
- milestones/05_2017_transformer/STATUS.md
- milestones/05_2017_transformer/FIXES_SUMMARY.md
- milestones/05_2017_transformer/DEBUG_REVERSAL.md
- tests/milestones/test_reversal_debug.py (component tests)

ARCHITECTURAL PRINCIPLE:
Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems.
Don't expose Module 05 concepts (gradients) in Module 01 (basic operations).
Monkey-patch when features are needed, not before.
2025-11-22 18:26:12 -05:00

263 lines
9.5 KiB
Python
Generated

# ╔═══════════════════════════════════════════════════════════════════════════════╗
# ║ 🚨 CRITICAL WARNING 🚨 ║
# ║ AUTOGENERATED! DO NOT EDIT! ║
# ║ ║
# ║ This file is AUTOMATICALLY GENERATED from source modules. ║
# ║ ANY CHANGES MADE HERE WILL BE LOST when modules are re-exported! ║
# ║ ║
# ║ ✅ TO EDIT: modules/XX_loader/loader.py ║
# ║ ✅ TO EXPORT: Run 'tito module complete <module_name>' ║
# ║ ║
# ║ 🛡️ STUDENT PROTECTION: This file contains optimized implementations. ║
# ║ Editing it directly may break module functionality and training. ║
# ║ ║
# ║ 🎓 LEARNING TIP: Work in modules/ - that's where real development ║
# ║ happens! The tinytorch/ directory is just the compiled output. ║
# ╚═══════════════════════════════════════════════════════════════════════════════╝
# %% auto 0
__all__ = ['Dataset', 'TensorDataset', 'DataLoader']
# %% ../../modules/source/08_dataloader/dataloader_dev.ipynb 0
#| default_exp data.loader
#| export
# %% ../../modules/source/08_dataloader/dataloader_dev.ipynb 2
# Essential imports for data loading
import numpy as np
import random
from typing import Iterator, Tuple, List, Optional, Union
from abc import ABC, abstractmethod
# Import real Tensor class from tinytorch package
from ..core.tensor import Tensor
# %% ../../modules/source/08_dataloader/dataloader_dev.ipynb 4
class Dataset(ABC):
"""
Abstract base class for all datasets.
Provides the fundamental interface that all datasets must implement:
- __len__(): Returns the total number of samples
- __getitem__(idx): Returns the sample at given index
TODO: Implement the abstract Dataset base class
APPROACH:
1. Use ABC (Abstract Base Class) to define interface
2. Mark methods as @abstractmethod to force implementation
3. Provide clear docstrings for subclasses
EXAMPLE:
>>> class MyDataset(Dataset):
... def __len__(self): return 100
... def __getitem__(self, idx): return idx
>>> dataset = MyDataset()
>>> print(len(dataset)) # 100
>>> print(dataset[42]) # 42
HINT: Abstract methods force subclasses to implement core functionality
"""
### BEGIN SOLUTION
@abstractmethod
def __len__(self) -> int:
"""
Return the total number of samples in the dataset.
This method must be implemented by all subclasses to enable
len(dataset) calls and batch size calculations.
"""
pass
@abstractmethod
def __getitem__(self, idx: int):
"""
Return the sample at the given index.
Args:
idx: Index of the sample to retrieve (0 <= idx < len(dataset))
Returns:
The sample at index idx. Format depends on the dataset implementation.
Could be (data, label) tuple, single tensor, etc.
"""
pass
### END SOLUTION
# %% ../../modules/source/08_dataloader/dataloader_dev.ipynb 7
class TensorDataset(Dataset):
"""
Dataset wrapping tensors for supervised learning.
Each sample is a tuple of tensors from the same index across all input tensors.
All tensors must have the same size in their first dimension.
TODO: Implement TensorDataset for tensor-based data
APPROACH:
1. Store all input tensors
2. Validate they have same first dimension (number of samples)
3. Return tuple of tensor slices for each index
EXAMPLE:
>>> features = Tensor([[1, 2], [3, 4], [5, 6]]) # 3 samples, 2 features each
>>> labels = Tensor([0, 1, 0]) # 3 labels
>>> dataset = TensorDataset(features, labels)
>>> print(len(dataset)) # 3
>>> print(dataset[1]) # (Tensor([3, 4]), Tensor(1))
HINTS:
- Use *tensors to accept variable number of tensor arguments
- Check all tensors have same length in dimension 0
- Return tuple of tensor[idx] for all tensors
"""
def __init__(self, *tensors):
"""
Create dataset from multiple tensors.
Args:
*tensors: Variable number of Tensor objects
All tensors must have the same size in their first dimension.
"""
### BEGIN SOLUTION
assert len(tensors) > 0, "Must provide at least one tensor"
# Store all tensors
self.tensors = tensors
# Validate all tensors have same first dimension
first_size = len(tensors[0].data) # Size of first dimension
for i, tensor in enumerate(tensors):
if len(tensor.data) != first_size:
raise ValueError(
f"All tensors must have same size in first dimension. "
f"Tensor 0: {first_size}, Tensor {i}: {len(tensor.data)}"
)
### END SOLUTION
def __len__(self) -> int:
"""Return number of samples (size of first dimension)."""
### BEGIN SOLUTION
return len(self.tensors[0].data)
### END SOLUTION
def __getitem__(self, idx: int) -> Tuple[Tensor, ...]:
"""
Return tuple of tensor slices at given index.
Args:
idx: Sample index
Returns:
Tuple containing tensor[idx] for each input tensor
"""
### BEGIN SOLUTION
if idx >= len(self) or idx < 0:
raise IndexError(f"Index {idx} out of range for dataset of size {len(self)}")
# Return tuple of slices from all tensors
return tuple(Tensor(tensor.data[idx]) for tensor in self.tensors)
### END SOLUTION
# %% ../../modules/source/08_dataloader/dataloader_dev.ipynb 10
class DataLoader:
"""
Data loader with batching and shuffling support.
Wraps a dataset to provide batched iteration with optional shuffling.
Essential for efficient training with mini-batch gradient descent.
TODO: Implement DataLoader with batching and shuffling
APPROACH:
1. Store dataset, batch_size, and shuffle settings
2. Create iterator that groups samples into batches
3. Handle shuffling by randomizing indices
4. Collate individual samples into batch tensors
EXAMPLE:
>>> dataset = TensorDataset(Tensor([[1,2], [3,4], [5,6]]), Tensor([0,1,0]))
>>> loader = DataLoader(dataset, batch_size=2, shuffle=True)
>>> for batch in loader:
... features_batch, labels_batch = batch
... print(f"Features: {features_batch.shape}, Labels: {labels_batch.shape}")
HINTS:
- Use random.shuffle() for index shuffling
- Group consecutive samples into batches
- Stack individual tensors using np.stack()
"""
def __init__(self, dataset: Dataset, batch_size: int, shuffle: bool = False):
"""
Create DataLoader for batched iteration.
Args:
dataset: Dataset to load from
batch_size: Number of samples per batch
shuffle: Whether to shuffle data each epoch
"""
### BEGIN SOLUTION
self.dataset = dataset
self.batch_size = batch_size
self.shuffle = shuffle
### END SOLUTION
def __len__(self) -> int:
"""Return number of batches per epoch."""
### BEGIN SOLUTION
# Calculate number of complete batches
return (len(self.dataset) + self.batch_size - 1) // self.batch_size
### END SOLUTION
def __iter__(self) -> Iterator:
"""Return iterator over batches."""
### BEGIN SOLUTION
# Create list of indices
indices = list(range(len(self.dataset)))
# Shuffle if requested
if self.shuffle:
random.shuffle(indices)
# Yield batches
for i in range(0, len(indices), self.batch_size):
batch_indices = indices[i:i + self.batch_size]
batch = [self.dataset[idx] for idx in batch_indices]
# Collate batch - convert list of tuples to tuple of tensors
yield self._collate_batch(batch)
### END SOLUTION
def _collate_batch(self, batch: List[Tuple[Tensor, ...]]) -> Tuple[Tensor, ...]:
"""
Collate individual samples into batch tensors.
Args:
batch: List of sample tuples from dataset
Returns:
Tuple of batched tensors
"""
### BEGIN SOLUTION
if len(batch) == 0:
return ()
# Determine number of tensors per sample
num_tensors = len(batch[0])
# Group tensors by position
batched_tensors = []
for tensor_idx in range(num_tensors):
# Extract all tensors at this position
tensor_list = [sample[tensor_idx].data for sample in batch]
# Stack into batch tensor
batched_data = np.stack(tensor_list, axis=0)
batched_tensors.append(Tensor(batched_data))
return tuple(batched_tensors)
### END SOLUTION