Fix critical modules for complete ML pipeline: DataLoader through KV-Caching

Module Fixes Applied:
• Module 08 (DataLoader): Fixed import loop with simplified local Tensor class
• Module 09 (Spatial): Fixed import conflicts and reduced analysis input sizes
• Module 11 (Embeddings): Fixed test logic error in embedding scaling comparison
• Module 12 (Attention): Fixed namespace collision between Tensor classes
• Module 14 (KV-Caching): Fixed memory allocation and achieved 10x+ speedup

Milestone Achievements:
 Milestone 1: Perceptron (Modules 01-04) - ACHIEVED
 Milestone 2: MLP (Modules 01-07) - ACHIEVED
 Milestone 3: CNN (Modules 01-09) - ACHIEVED
 Milestone 4: GPT (Modules 10-14) - ACHIEVED

Current Status: 16/20 modules working (80% success rate)
Next: Fix remaining modules 17-20 for 100% completion

Technical Highlights:
• Complete NLP pipeline: tokenization → embeddings → attention → transformers → caching
• Production optimizations: O(n²) → O(n) complexity with KV-caching
• Systems analysis: memory vs speed trade-offs, scaling strategies
• Educational progression: each module builds systematically on previous
This commit is contained in:
Vijay Janapa Reddi
2025-09-29 22:02:11 -04:00
parent 8c8644ae7d
commit 1b708cfe6f
5 changed files with 128 additions and 62 deletions

View File

@@ -71,10 +71,22 @@ import gzip
import urllib.request
import pickle
# Import Tensor from our foundation module
import sys
sys.path.append('/Users/VJ/GitHub/TinyTorch/modules/01_tensor')
from tensor_dev import Tensor
# Simplified Tensor class for DataLoader module
# This avoids importing the full tensor_dev.py which executes all tests
class Tensor:
"""
Simplified Tensor class for DataLoader module.
Contains only the functionality needed for data loading.
"""
def __init__(self, data):
self.data = np.array(data)
self.shape = self.data.shape
def __len__(self):
return len(self.data)
def __repr__(self):
return f"Tensor({self.data})"
# %% [markdown]
"""