docs: update module references in README and guides

- Update README.md module structure (14→Profiling, 15→Memoization)
- Fix tier descriptions (10-13 Architecture, 14-19 Optimization)
- Update Module 13 next steps to reference Module 15
- Fix Module 15 prerequisite reference to Module 14
- Correct cifar10-training-guide module numbers
This commit is contained in:
Vijay Janapa Reddi
2025-11-09 12:42:27 -05:00
parent 5eaffe8501
commit 757d50b717
4 changed files with 54 additions and 14 deletions

View File

@@ -14,7 +14,9 @@
![GitHub Stars](https://img.shields.io/github/stars/MLSysBook/TinyTorch?style=social)
![Contributors](https://img.shields.io/github/contributors/MLSysBook/TinyTorch)
> 🚧 **Work in Progress** - Actively developing TinyTorch for Spring 2025! All 20 core modules (01-20) are implemented but still being debugged and tested. Core foundation modules (01-09) are stable. Transformer and optimization modules (10-20) are functional but undergoing refinement. Join us in building the future of ML systems education.
> 📢 **December 2024 Release** - TinyTorch is ready for community review! All 20 modules (Tensor → Transformers → Optimization → Capstone) are implemented with complete solutions. **Seeking feedback on pedagogy, implementation quality, and learning progression.** Student version tooling exists but is untested. This release focuses on validating the educational content before classroom deployment.
>
> 🎯 **For Reviewers**: Read the [📚 Jupyter Book](https://mlsysbook.github.io/TinyTorch/) to evaluate pedagogy. Clone the repo to run implementations. See [STUDENT_VERSION_TOOLING.md](STUDENT_VERSION_TOOLING.md) for classroom deployment plans.
## 📖 Table of Contents
- [Why TinyTorch?](#why-tinytorch)
@@ -71,8 +73,8 @@ TinyTorch/
│ │ ├── 11_embeddings/ # Module 11: Token & positional embeddings
│ │ ├── 12_attention/ # Module 12: Multi-head attention
│ │ ├── 13_transformers/ # Module 13: Complete transformer blocks
│ │ ├── 14_kvcaching/ # Module 14: KV-cache optimization
│ │ ├── 15_profiling/ # Module 15: Performance analysis
│ │ ├── 14_profiling/ # Module 14: Performance analysis
│ │ ├── 15_memoization/ # Module 15: KV-cache/memoization
│ │ ├── 16_acceleration/ # Module 16: Hardware optimization
│ │ ├── 17_quantization/ # Module 17: Model compression
│ │ ├── 18_compression/ # Module 18: Pruning & distillation
@@ -178,7 +180,7 @@ Build transformers that generate text
| 11 | Embeddings | Token embeddings + positional encoding | **Embedding tables** (vocab × dim parameters), lookup performance |
| 12 | Attention | Multi-head attention mechanisms | **O(N²) scaling**, memory bottlenecks, attention optimization |
| 13 | Transformers | Complete transformer blocks | **Layer scaling**, memory requirements, architectural trade-offs |
| 14 | KV-Caching | Inference optimization for transformers | **Memory vs compute trade-offs**, cache management, generation efficiency |
| 14 | Profiling | Performance analysis + bottleneck detection | **Memory profiling**, FLOP counting, **Amdahl's Law**, performance measurement |
**Milestone Achievement**: TinyGPT language generation with optimized inference
@@ -189,7 +191,7 @@ Profile, optimize, and benchmark ML systems
| Module | Topic | What You Build | ML Systems Learning |
|--------|-------|----------------|-------------------|
| 15 | Profiling | Performance analysis + bottleneck detection | **Memory profiling**, FLOP counting, **Amdahl's Law**, performance measurement |
| 15 | Memoization | Computational reuse via KV-caching | **Memory vs compute trade-offs**, cache management, generation efficiency |
| 16 | Acceleration | Hardware optimization + cache-friendly algorithms | **Cache hierarchies**, memory access patterns, **vectorization vs loops** |
| 17 | Quantization | Model compression + precision reduction | **Precision trade-offs** (FP32→INT8), memory reduction, accuracy preservation |
| 18 | Compression | Pruning + knowledge distillation | **Sparsity patterns**, parameter reduction, **compression ratios** |
@@ -242,8 +244,8 @@ tito checkpoint timeline
- **01-02**: Foundation (Tensor, Activations)
- **03-07**: Core Networks (Layers, Losses, Autograd, Optimizers, Training)
- **08-09**: Computer Vision (DataLoader, Spatial ops - unlocks CIFAR-10 @ 75%+)
- **10-14**: Language Models (Tokenization, Embeddings, Attention, Transformers, KV-Caching)
- **15-19**: System Optimization (Profiling, Acceleration, Quantization, Compression, Benchmarking)
- **10-13**: Language Models (Tokenization, Embeddings, Attention, Transformers)
- **14-19**: System Optimization (Profiling, Memoization, Quantization, Compression, Acceleration, Benchmarking)
- **20**: Capstone (Complete end-to-end ML systems)
Each module asks: **"Can I build this capability from scratch?"** with hands-on validation.
@@ -389,8 +391,8 @@ pytest tests/
-**20 modules implemented** (01 Tensor → 20 Capstone) - all code exists
-**6 historical milestones** (1957 Perceptron → 2024 Systems Age)
-**Foundation modules stable** (01-09): Tensor through Spatial operations
- 🚧 **Transformer modules functional** (10-14): Tokenization through KV-Caching - undergoing testing
- 🚧 **Optimization modules functional** (15-20): Profiling through Capstone - undergoing testing
- 🚧 **Transformer modules functional** (10-13): Tokenization through Transformers - undergoing testing
- 🚧 **Optimization modules functional** (14-20): Profiling through Capstone - undergoing testing
-**KISS principle design** for clear, maintainable code
-**Essential-only features**: Focus on what's used in production ML systems
- 🎯 **Target: Spring 2025** - Active debugging and refinement in progress
@@ -437,6 +439,44 @@ tito benchmark submit --event cnn_marathon
📊 **View Leaderboard**: [TinyMLPerf Competition](https://mlsysbook.github.io/TinyTorch/leaderboard.html) | Future: `tinytorch.org/leaderboard`
## Academic Integrity & Solutions Philosophy
### Why Solutions Are Public
TinyTorch releases complete implementations publicly to support:
- **Transparent peer review** of educational materials
- **Instructor evaluation** before course adoption
- **Open-source community** contribution and improvement
- **Real-world learning** from production-quality code
### For Students: Learning > Copying
**TinyTorch's pedagogy makes copying solutions ineffective:**
1. **Progressive Complexity**: Module 05 (Autograd) requires deep understanding of Modules 01-04. You cannot fake building automatic differentiation by copying code you don't understand.
2. **Integration Requirements**: Each module builds on previous work. Superficial copying breaks down as complexity compounds.
3. **Systems Thinking**: The learning goal is understanding memory management, computational graphs, and performance trade-offs—not just getting tests to pass.
4. **Self-Correcting**: Students who copy without understanding fail subsequent modules. The system naturally identifies shallow work.
### For Instructors: Pedagogy Over Secrecy
Modern ML education accepts that solutions are findable (Chegg, Course Hero, Discord). Defense comes through:
**✅ Progressive module dependencies** (can't fake understanding)
**✅ Changed parameters/datasets** each semester
**✅ Competitive benchmarking** (reveals true optimization skill)
**✅ Honor codes** (trust students to learn honestly)
**✅ Focus on journey** (building > having built)
See [STUDENT_VERSION_TOOLING.md](STUDENT_VERSION_TOOLING.md) for classroom deployment strategies.
### Honor Code
> "I understand that TinyTorch solutions are public for educational transparency. I commit to building my own understanding by struggling with implementations, not copying code. I recognize that copying teaches nothing and that subsequent modules will expose shallow understanding. I choose to learn."
## Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

View File

@@ -465,7 +465,7 @@ This module implements patterns from:
## What's Next?
In **Module 14: KV Caching** (Performance Tier), you'll optimize transformers for production:
In **Module 15: Memoization** (Optimization Tier), you'll optimize transformers for production:
- Cache key and value matrices to avoid recomputation
- Reduce inference latency by 10-100× for long sequences

View File

@@ -432,7 +432,7 @@ This module implements patterns from:
## What's Next?
In **Module 15: Profiling**, you'll measure where time goes in your transformer:
In **Module 14: Profiling**, you measured where time goes in your transformer. Now you'll fix the bottleneck:
- Profile attention, feedforward, and embedding operations
- Identify computational bottlenecks beyond caching

View File

@@ -6,9 +6,9 @@ This guide walks you through training a CNN on CIFAR-10 using your TinyTorch imp
## Prerequisites
Complete these modules first:
- ✅ Module 08: DataLoader (for CIFAR-10 loading)
- ✅ Module 11: Training (for model checkpointing)
- ✅ Module 06: Spatial (for CNN layers)
- ✅ Module 10: Optimizers (for Adam optimizer)
- ✅ Module 07: Training (for model checkpointing)
- ✅ Module 09: Convolutional Networks (for CNN layers)
- ✅ Module 06: Optimizers (for Adam optimizer)
## Step 1: Load CIFAR-10 Data