diff --git a/README.md b/README.md index d05e4354..17e5abb0 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,9 @@ ![GitHub Stars](https://img.shields.io/github/stars/MLSysBook/TinyTorch?style=social) ![Contributors](https://img.shields.io/github/contributors/MLSysBook/TinyTorch) -> 🚧 **Work in Progress** - Actively developing TinyTorch for Spring 2025! All 20 core modules (01-20) are implemented but still being debugged and tested. Core foundation modules (01-09) are stable. Transformer and optimization modules (10-20) are functional but undergoing refinement. Join us in building the future of ML systems education. +> πŸ“’ **December 2024 Release** - TinyTorch is ready for community review! All 20 modules (Tensor β†’ Transformers β†’ Optimization β†’ Capstone) are implemented with complete solutions. **Seeking feedback on pedagogy, implementation quality, and learning progression.** Student version tooling exists but is untested. This release focuses on validating the educational content before classroom deployment. +> +> 🎯 **For Reviewers**: Read the [πŸ“š Jupyter Book](https://mlsysbook.github.io/TinyTorch/) to evaluate pedagogy. Clone the repo to run implementations. See [STUDENT_VERSION_TOOLING.md](STUDENT_VERSION_TOOLING.md) for classroom deployment plans. ## πŸ“– Table of Contents - [Why TinyTorch?](#why-tinytorch) @@ -71,8 +73,8 @@ TinyTorch/ β”‚ β”‚ β”œβ”€β”€ 11_embeddings/ # Module 11: Token & positional embeddings β”‚ β”‚ β”œβ”€β”€ 12_attention/ # Module 12: Multi-head attention β”‚ β”‚ β”œβ”€β”€ 13_transformers/ # Module 13: Complete transformer blocks -β”‚ β”‚ β”œβ”€β”€ 14_kvcaching/ # Module 14: KV-cache optimization -β”‚ β”‚ β”œβ”€β”€ 15_profiling/ # Module 15: Performance analysis +β”‚ β”‚ β”œβ”€β”€ 14_profiling/ # Module 14: Performance analysis +β”‚ β”‚ β”œβ”€β”€ 15_memoization/ # Module 15: KV-cache/memoization β”‚ β”‚ β”œβ”€β”€ 16_acceleration/ # Module 16: Hardware optimization β”‚ β”‚ β”œβ”€β”€ 17_quantization/ # Module 17: Model compression β”‚ β”‚ β”œβ”€β”€ 18_compression/ # Module 18: Pruning & distillation @@ -178,7 +180,7 @@ Build transformers that generate text | 11 | Embeddings | Token embeddings + positional encoding | **Embedding tables** (vocab Γ— dim parameters), lookup performance | | 12 | Attention | Multi-head attention mechanisms | **O(NΒ²) scaling**, memory bottlenecks, attention optimization | | 13 | Transformers | Complete transformer blocks | **Layer scaling**, memory requirements, architectural trade-offs | -| 14 | KV-Caching | Inference optimization for transformers | **Memory vs compute trade-offs**, cache management, generation efficiency | +| 14 | Profiling | Performance analysis + bottleneck detection | **Memory profiling**, FLOP counting, **Amdahl's Law**, performance measurement | **Milestone Achievement**: TinyGPT language generation with optimized inference @@ -189,7 +191,7 @@ Profile, optimize, and benchmark ML systems | Module | Topic | What You Build | ML Systems Learning | |--------|-------|----------------|-------------------| -| 15 | Profiling | Performance analysis + bottleneck detection | **Memory profiling**, FLOP counting, **Amdahl's Law**, performance measurement | +| 15 | Memoization | Computational reuse via KV-caching | **Memory vs compute trade-offs**, cache management, generation efficiency | | 16 | Acceleration | Hardware optimization + cache-friendly algorithms | **Cache hierarchies**, memory access patterns, **vectorization vs loops** | | 17 | Quantization | Model compression + precision reduction | **Precision trade-offs** (FP32β†’INT8), memory reduction, accuracy preservation | | 18 | Compression | Pruning + knowledge distillation | **Sparsity patterns**, parameter reduction, **compression ratios** | @@ -242,8 +244,8 @@ tito checkpoint timeline - **01-02**: Foundation (Tensor, Activations) - **03-07**: Core Networks (Layers, Losses, Autograd, Optimizers, Training) - **08-09**: Computer Vision (DataLoader, Spatial ops - unlocks CIFAR-10 @ 75%+) -- **10-14**: Language Models (Tokenization, Embeddings, Attention, Transformers, KV-Caching) -- **15-19**: System Optimization (Profiling, Acceleration, Quantization, Compression, Benchmarking) +- **10-13**: Language Models (Tokenization, Embeddings, Attention, Transformers) +- **14-19**: System Optimization (Profiling, Memoization, Quantization, Compression, Acceleration, Benchmarking) - **20**: Capstone (Complete end-to-end ML systems) Each module asks: **"Can I build this capability from scratch?"** with hands-on validation. @@ -389,8 +391,8 @@ pytest tests/ - βœ… **20 modules implemented** (01 Tensor β†’ 20 Capstone) - all code exists - βœ… **6 historical milestones** (1957 Perceptron β†’ 2024 Systems Age) - βœ… **Foundation modules stable** (01-09): Tensor through Spatial operations -- 🚧 **Transformer modules functional** (10-14): Tokenization through KV-Caching - undergoing testing -- 🚧 **Optimization modules functional** (15-20): Profiling through Capstone - undergoing testing +- 🚧 **Transformer modules functional** (10-13): Tokenization through Transformers - undergoing testing +- 🚧 **Optimization modules functional** (14-20): Profiling through Capstone - undergoing testing - βœ… **KISS principle design** for clear, maintainable code - βœ… **Essential-only features**: Focus on what's used in production ML systems - 🎯 **Target: Spring 2025** - Active debugging and refinement in progress @@ -437,6 +439,44 @@ tito benchmark submit --event cnn_marathon πŸ“Š **View Leaderboard**: [TinyMLPerf Competition](https://mlsysbook.github.io/TinyTorch/leaderboard.html) | Future: `tinytorch.org/leaderboard` +## Academic Integrity & Solutions Philosophy + +### Why Solutions Are Public + +TinyTorch releases complete implementations publicly to support: +- **Transparent peer review** of educational materials +- **Instructor evaluation** before course adoption +- **Open-source community** contribution and improvement +- **Real-world learning** from production-quality code + +### For Students: Learning > Copying + +**TinyTorch's pedagogy makes copying solutions ineffective:** + +1. **Progressive Complexity**: Module 05 (Autograd) requires deep understanding of Modules 01-04. You cannot fake building automatic differentiation by copying code you don't understand. + +2. **Integration Requirements**: Each module builds on previous work. Superficial copying breaks down as complexity compounds. + +3. **Systems Thinking**: The learning goal is understanding memory management, computational graphs, and performance trade-offsβ€”not just getting tests to pass. + +4. **Self-Correcting**: Students who copy without understanding fail subsequent modules. The system naturally identifies shallow work. + +### For Instructors: Pedagogy Over Secrecy + +Modern ML education accepts that solutions are findable (Chegg, Course Hero, Discord). Defense comes through: + +**βœ… Progressive module dependencies** (can't fake understanding) +**βœ… Changed parameters/datasets** each semester +**βœ… Competitive benchmarking** (reveals true optimization skill) +**βœ… Honor codes** (trust students to learn honestly) +**βœ… Focus on journey** (building > having built) + +See [STUDENT_VERSION_TOOLING.md](STUDENT_VERSION_TOOLING.md) for classroom deployment strategies. + +### Honor Code + +> "I understand that TinyTorch solutions are public for educational transparency. I commit to building my own understanding by struggling with implementations, not copying code. I recognize that copying teaches nothing and that subsequent modules will expose shallow understanding. I choose to learn." + ## Contributing We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. diff --git a/book/chapters/13-transformers.md b/book/chapters/13-transformers.md index 0f7189a1..0eb045e3 100644 --- a/book/chapters/13-transformers.md +++ b/book/chapters/13-transformers.md @@ -465,7 +465,7 @@ This module implements patterns from: ## What's Next? -In **Module 14: KV Caching** (Performance Tier), you'll optimize transformers for production: +In **Module 15: Memoization** (Optimization Tier), you'll optimize transformers for production: - Cache key and value matrices to avoid recomputation - Reduce inference latency by 10-100Γ— for long sequences diff --git a/book/chapters/15-memoization.md b/book/chapters/15-memoization.md index 575d59f9..833c071e 100644 --- a/book/chapters/15-memoization.md +++ b/book/chapters/15-memoization.md @@ -432,7 +432,7 @@ This module implements patterns from: ## What's Next? -In **Module 15: Profiling**, you'll measure where time goes in your transformer: +In **Module 14: Profiling**, you measured where time goes in your transformer. Now you'll fix the bottleneck: - Profile attention, feedforward, and embedding operations - Identify computational bottlenecks beyond caching diff --git a/docs/cifar10-training-guide.md b/docs/cifar10-training-guide.md index cb052846..0e144dea 100644 --- a/docs/cifar10-training-guide.md +++ b/docs/cifar10-training-guide.md @@ -6,9 +6,9 @@ This guide walks you through training a CNN on CIFAR-10 using your TinyTorch imp ## Prerequisites Complete these modules first: - βœ… Module 08: DataLoader (for CIFAR-10 loading) -- βœ… Module 11: Training (for model checkpointing) -- βœ… Module 06: Spatial (for CNN layers) -- βœ… Module 10: Optimizers (for Adam optimizer) +- βœ… Module 07: Training (for model checkpointing) +- βœ… Module 09: Convolutional Networks (for CNN layers) +- βœ… Module 06: Optimizers (for Adam optimizer) ## Step 1: Load CIFAR-10 Data