mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-06-02 17:16:34 -05:00

Go to file

Vijay Janapa Reddi 38a5381bef Adds initial TinyTorch CLI and core structure

Introduces the foundational CLI structure and core components for the TinyTorch project.

This initial commit establishes the command-line interface (CLI) using `argparse` for training, evaluation, benchmarking, and system information. It also lays out the basic directory structure and essential modules, including tensor operations, autograd, neural network layers, optimizers, data loading, and MLOps components.

2025-07-09 00:23:19 -04:00

bin

Adds initial TinyTorch CLI and core structure

2025-07-09 00:23:19 -04:00

tinytorch

Adds initial TinyTorch CLI and core structure

2025-07-09 00:23:19 -04:00

.gitattributes

Initial commit

2025-07-08 22:47:00 -04:00

logo_small.jpg

Adds initial TinyTorch CLI and core structure

2025-07-09 00:23:19 -04:00

logo.png

Adds initial TinyTorch CLI and core structure

2025-07-09 00:23:19 -04:00

README.md

Adds initial TinyTorch CLI and core structure

2025-07-09 00:23:19 -04:00

requirements.txt

Adds initial TinyTorch CLI and core structure

2025-07-09 00:23:19 -04:00

README.md

Tiny🔥Torch: Build a Machine Learning System from Scratch

TinyTorch is a pedagogical project designed to accompany the Machine Learning Systems textbook. Inspired by OS and compiler courses where students build entire systems from first principles, TinyTorch guides you through building a complete ML training and inference runtime — from autograd to data pipelines, optimizers to profilers — entirely from scratch.

This is not a PyTorch tutorial. In TinyTorch, you'll write the components that frameworks like PyTorch are built on.

🎯 What You'll Build

By the end of this project, you'll have implemented a fully functional ML system capable of:

Training neural networks (MLPs, CNNs) on real datasets (MNIST, CIFAR-10)
Automatic differentiation with a custom autograd engine
Memory-efficient data loading with custom DataLoader implementations
Multiple optimization algorithms (SGD, Adam, RMSprop)
Performance profiling and bottleneck identification
Model compression through pruning and quantization
Custom compute kernels for matrix operations
Production monitoring with MLOps infrastructure
Reproducible experiments with checkpointing and logging

End Goal: Train a CNN on CIFAR-10 achieving >85% accuracy using only your implementation.

🧠 Project Goals & Learning Objectives

Core Learning Objectives

Systems Understanding: Learn how modern ML systems are constructed, not just how to use them
Full-Stack ML Infrastructure: Build core components from tensor operations to training orchestration
Performance Engineering: Understand computational and memory bottlenecks in ML workloads
Software Architecture: Design modular, extensible systems with clean abstractions
Infrastructure Thinking: Make design decisions that impact performance, reproducibility, and maintainability

Technical Skills Gained

Low-level ML Implementation: Tensor operations, gradient computation, optimization algorithms
Memory Management: Efficient data structures, gradient accumulation, batch processing
Performance Optimization: Profiling, kernel optimization, memory access patterns
System Design: Modular architecture, clean APIs, extensible frameworks
Testing & Validation: Numerical stability, gradient checking, performance regression testing

🏗️ System Architecture

Core Components Overview

┌─────────────────────────────────────────────────────────────┐
│                    TinyTorch System                         │
├─────────────────────────────────────────────────────────────┤
│  CLI Interface (bin/tito.py)                               │
├─────────────────────────────────────────────────────────────┤
│  Training Orchestration (trainer.py)                       │
├─────────────────────────────────────────────────────────────┤
│  Model Definition     │  Data Pipeline    │  Optimization   │
│  (modules.py)         │  (dataloader.py)  │  (optimizer.py) │
├─────────────────────────────────────────────────────────────┤
│  Automatic Differentiation Engine (autograd)               │
├─────────────────────────────────────────────────────────────┤
│  Tensor Operations & Storage (tensor.py)                   │
├─────────────────────────────────────────────────────────────┤
│  Profiling & MLOps (profiler.py, mlops.py)                 │
└─────────────────────────────────────────────────────────────┘

Design Principles

Modularity: Each component has a single responsibility and clean interfaces
Composability: Components can be mixed and matched for different use cases
Performance: Designed for efficiency without sacrificing readability
Extensibility: Easy to add new layers, optimizers, and functionality
Debuggability: Built-in profiling and logging for understanding system behavior

📚 Curriculum Integration & Roadmap

TinyTorch aligns with Chapters 1–13 of the Machine Learning Systems textbook. Each project builds progressively toward a complete ML infrastructure.

Course Progression

Order	Project	Chapter	Component	Learning Focus	Deliverable
0	`setup`	-	Setup & Onboarding	Environment setup, tool familiarity	Working dev environment + basic commands
1	`tensor`	1-2	Core Tensor System	Tensor operations, memory management	Working Tensor class with basic ops
2	`mlp`	3	Multi-Layer Perceptron	Forward/backward pass, gradient computation	Train simple MLP on MNIST
3	`cnn`	4	Convolutional Networks	Convolution, pooling operations	Conv2D and MaxPool implementations
4	`autograd`	5	Automatic Differentiation	Computational graphs, autodiff	Complete autograd engine
5	`data`	6	Data Pipeline	Efficient data loading, batching	Custom DataLoader with transformations
6	`training`	7-8	Training Loop	Optimization algorithms, metrics	SGD, Adam optimizers + training harness
7	`config`	9	Configuration System	Experiment management, logging	YAML configs + structured logging
8	`profiling`	10	Performance Tools	Performance measurement, debugging	Memory/compute profiler with visualizations
9	`compression`	11	Model Compression	Pruning, quantization techniques	Compress model while maintaining accuracy
10	`kernels`	12	Custom Kernels	Low-level optimization, vectorization	Optimized matrix multiplication kernels
11	`benchmarking`	13	Performance Evaluation	Performance testing, comparison	Comprehensive benchmarking suite
12	`mlops`	14	MLOps & Monitoring	Data drift detection, continuous updates	Production monitoring and auto-retraining system

Note

: Project directories use descriptive names (e.g., projects/tensor/, projects/autograd/) for clarity. The order above represents the recommended learning sequence and can be adjusted as the curriculum evolves.

Milestone Targets

Week 1: Environment setup (setup) and basic command familiarity
Week 3: Core tensor operations (tensor) working
Week 5: Train MLP on MNIST (mlp) achieving >95% accuracy
Week 8: Train CNN on CIFAR-10 (cnn) achieving >80% accuracy
Week 10: Complete autograd engine (autograd) with gradient checking
Week 12: Optimized system with profiling tools (profiling)
Final: Complete system with MLOps monitoring (mlops)

📦 Course Repository Structure

TinyTorch/
├── bin/                           # Command-line interfaces
│   └── tito.py                    # Main TinyTorch CLI (tito)
├── tinytorch/                     # Core ML system package
│   ├── core/                      # Core ML components
│   │   ├── __init__.py
│   │   ├── tensor.py              # Tensor class with autograd support
│   │   ├── autograd.py            # Automatic differentiation engine
│   │   ├── modules.py             # Neural network layers and models
│   │   ├── functional.py          # Core operations (conv2d, relu, etc.)
│   │   ├── dataloader.py          # Data loading and preprocessing
│   │   ├── optimizer.py           # Optimization algorithms
│   │   ├── trainer.py             # Training loop orchestration
│   │   ├── profiler.py            # Performance measurement tools
│   │   ├── benchmark.py           # Benchmarking and evaluation
│   │   ├── mlops.py               # MLOps and production monitoring
│   │   └── utils.py               # Utility functions
│   ├── configs/                   # Configuration files
│   │   ├── default.yaml           # Default training configuration
│   │   ├── models/                # Model-specific configs
│   │   └── datasets/              # Dataset-specific configs
│   └── datasets/                  # Dataset implementations
│       ├── __init__.py
│       ├── mnist.py
│       ├── cifar10.py
│       └── transforms.py
├── projects/                      # Component-specific projects
│   ├── setup/                    # Environment setup & onboarding
│   ├── tensor/                   # Core tensor implementation
│   ├── mlp/                      # Multi-layer perceptron
│   ├── cnn/                      # Convolutional neural networks
│   ├── config/                   # Configuration system
│   ├── data/                     # Data pipeline & loading
│   ├── autograd/                 # Automatic differentiation
│   ├── training/                 # Training loop & optimization
│   ├── profiling/                # Performance profiling tools
│   ├── compression/              # Model compression techniques
│   ├── kernels/                  # Custom compute kernels
│   ├── benchmarking/             # Performance benchmarking
│   └── mlops/                    # MLOps & production monitoring
├── docs/                         # Course documentation
│   ├── tutorials/                # Step-by-step tutorials
│   ├── api/                      # API documentation
│   └── lectures/                 # Lecture materials
├── notebooks/                    # Jupyter tutorials and demos
├── examples/                     # Working examples and demos
│   ├── train_mnist_mlp.py
│   ├── train_cifar_cnn.py
│   └── benchmark_ops.py
├── tests/                        # Comprehensive test suite
│   ├── test_tensor.py
│   ├── test_autograd.py
│   ├── test_modules.py
│   └── test_training.py
├── grading/                      # Course grading materials
│   ├── rubrics/                  # Assignment rubrics
│   ├── autograders/              # Automated grading scripts
│   └── solutions/               # Reference solutions
├── resources/                    # Course resources
│   ├── datasets/                 # Course datasets
│   ├── pretrained/              # Pre-trained models
│   └── references/              # Reference materials
├── logs/                         # Training logs and artifacts
│   └── runs/
├── checkpoints/                  # Model checkpoints
├── requirements.txt              # Python dependencies
└── README.md                     # This file

🚀 Getting Started

Prerequisites

Python 3.8+ (type hints and modern features required)
NumPy (numerical computations)
Optional: Numba (JIT compilation for performance)
Development: pytest, black, mypy (for testing and code quality)

Quick Start

# Clone and setup
git clone <repository-url>
cd TinyTorch
pip install -r requirements.txt

# Run your first training
python bin/tito.py train --config tinytorch/configs/default.yaml

# Check system status
python bin/tito.py info --show-architecture

For Instructors

# Set up the complete course environment
pip install -r requirements.txt

# Generate course materials
python bin/tito.py generate-projects
python bin/tito.py setup-autograders

# View course progress
python bin/tito.py course-status

For Students

# Start with Project 0: Setup  
cd projects/setup/
python bin/tito.py project init

# Then move through the sequence
cd projects/tensor/     # Project 1: Core tensors
cd projects/mlp/        # Project 2: Multi-layer perceptron  
cd projects/autograd/   # Project 4: Automatic differentiation

# Run tests and submit
python bin/tito.py test --project tensor
python bin/tito.py submit --project tensor

🎯 Implementation Guidelines

Code Quality Standards

Type Hints: All public APIs must have complete type annotations
Documentation: Docstrings for all classes and public methods
Testing: >90% code coverage with unit and integration tests
Performance: Profile-guided optimization with benchmarking
Style: Black code formatting, consistent naming conventions

API Design Principles

Familiar Interface: Similar to PyTorch where it makes sense (for learning transfer)
Explicit Over Implicit: Clear parameter names and behavior
Composable: Small, focused components that work together
Debuggable: Rich error messages and debugging hooks

Performance Targets

MNIST MLP: <5 seconds per epoch on modern laptop
CIFAR-10 CNN: <30 seconds per epoch on modern laptop
Memory Usage: <2GB RAM for standard training runs
Numerical Stability: Gradient checking passes for all operations

🧪 Testing & Validation Strategy

Test Categories

Unit Tests: Individual component functionality
Integration Tests: Component interaction and data flow
Numerical Tests: Gradient checking and mathematical correctness
Performance Tests: Regression testing for speed and memory
End-to-End Tests: Complete training runs with known results

Validation Methodology

Gradient Checking: Numerical verification of all autodiff operations
Reference Comparisons: Output validation against NumPy/PyTorch (where applicable)
Convergence Testing: Training curves must match expected behavior
Ablation Studies: Systematic testing of individual components

💡 Educational Philosophy

"You don't really understand a system until you've built it."

Learning Through Building

TinyTorch emphasizes active construction over passive consumption. Students don't just learn about autograd—they implement it. They don't just use optimizers—they write them from scratch.

Systems Thinking

By building a complete system, students understand:

Abstraction Boundaries: What belongs where in the system hierarchy
Performance Trade-offs: How design decisions impact speed and memory
Debugging Strategies: How to trace problems through complex systems
Integration Challenges: How components interact and depend on each other

Real-World Relevance

Every component in TinyTorch has a direct analog in production ML systems. The skills learned here transfer directly to understanding and contributing to frameworks like PyTorch, TensorFlow, and JAX.

🔧 Advanced Features & Extensions

Chapter 13: MLOps Deep Dive

Core MLOps Components (Chapter 13 will implement):

Data Drift Detection: Statistical tests for distribution shifts in input features
Model Performance Monitoring: Track accuracy, latency, and throughput in production
Automatic Retraining Triggers: When to retrain based on performance degradation
A/B Testing Framework: Compare model versions safely in production
Model Registry: Version control and metadata tracking for deployed models
Alert Systems: Notifications for model failures or performance drops
Rollback Mechanisms: Safe deployment and quick rollback strategies

Production Integration:

REST API Serving: Deploy models as web services
Batch Inference Pipelines: Large-scale offline predictions
Feature Store Integration: Consistent feature engineering across training/serving
Monitoring Dashboards: Real-time system health visualization

Optional Advanced Components

Mixed Precision Training: FP16/FP32 mixed precision implementation
Distributed Training: Multi-GPU and multi-node training support
Dynamic Graphs: Support for variable computation graphs
Custom Operators: Framework for implementing new operations
JIT Compilation: Integration with Numba or custom compilation

Research Extensions

Novel Optimizers: Implement cutting-edge optimization algorithms
Architecture Search: Automated neural architecture search
Compression Techniques: Advanced pruning and quantization methods
Hardware Acceleration: GPU kernels and specialized hardware support

📊 Success Metrics

Technical Milestones

Train MLP on MNIST achieving >95% accuracy
Implement working autograd engine with gradient checking
Train CNN on CIFAR-10 achieving >85% accuracy
Profile and optimize for 2x performance improvement
Complete all core project implementations

Learning Outcomes Assessment

Code Reviews: Peer and instructor evaluation of implementations
Design Document: Architecture decisions and trade-off analysis
Performance Analysis: Profiling report and optimization strategy
Presentation: Explain system design and key insights learned

🤝 Course Management

For Instructors

Project Management:

Each chapter has structured projects in projects/XX-name/
Rubrics and grading criteria in grading/rubrics/
Automated testing and grading in grading/autograders/
Reference solutions in grading/solutions/

Progress Tracking:

Student progress dashboards
Automated testing and feedback
Performance benchmarking
Code quality metrics

For Students

Development Process:

Start Each Project: Read project description in projects/XX-name/README.md
Implement Features: Follow step-by-step guided implementation
Run Tests: Use automated tests to validate implementation
Submit Work: Automated submission and grading system
Get Feedback: Detailed feedback on implementation and performance

Getting Help

Documentation: Comprehensive docs in docs/
Tutorials: Step-by-step tutorials in notebooks/
Office Hours: Regular sessions for questions and debugging
Peer Discussion: Collaborative learning encouraged
Issue Tracking: GitHub issues for bugs and feature requests

📬 License and Attribution

TinyTorch is part of the Machine Learning Systems course and textbook by Vijay Janapa Reddi et al. Inspired by systems-style pedagogical projects like xv6 (OS), PintOS (OS), and cs231n assignments (ML).

License: MIT
Citation: Please cite the Machine Learning Systems textbook when using this educational material.

🔗 Additional Resources

Textbook: Machine Learning Systems (Chapters 1-13)
Course Website: [Course URL]
Video Lectures: Complementary video content for each chapter
External Reading: Curated list of papers and blog posts
Community Forum: Discussion and Q&A platform
Office Hours: [Schedule and locations]

Languages

Python 84.5%

Jupyter Notebook 7.4%

HTML 2.8%

TeX 2.2%

JavaScript 1.3%

Other 1.8%

README.md Unescape Escape

Tiny🔥Torch: Build a Machine Learning System from Scratch

🎯 What You'll Build

🧠 Project Goals & Learning Objectives

Core Learning Objectives

Technical Skills Gained

🏗️ System Architecture

Core Components Overview

Design Principles

📚 Curriculum Integration & Roadmap

Course Progression

Milestone Targets

📦 Course Repository Structure

🚀 Getting Started

Prerequisites

Quick Start

For Instructors

For Students

🎯 Implementation Guidelines

Code Quality Standards

API Design Principles

Performance Targets

🧪 Testing & Validation Strategy

Test Categories

Validation Methodology

💡 Educational Philosophy

Learning Through Building

Systems Thinking

Real-World Relevance

🔧 Advanced Features & Extensions

Chapter 13: MLOps Deep Dive

Optional Advanced Components

Research Extensions

📊 Success Metrics

Technical Milestones

Learning Outcomes Assessment

🤝 Course Management

For Instructors

For Students

Getting Help

📬 License and Attribution

🔗 Additional Resources

README.md