From aa4e97636745f266b4afd81ea11c7c2ec5785c6b Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sun, 25 Jan 2026 13:48:39 -0500 Subject: [PATCH 1/4] Update tagline for AI engineers --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 7a9682537..9b201bfb0 100644 --- a/README.md +++ b/README.md @@ -508,6 +508,5 @@ Thanks goes to these wonderful people who have contributed to making this resour **[⭐ Star us on GitHub](https://github.com/harvard-edge/cs249r_book#support-this-work) • [✉️ Subscribe](https://buttondown.email/mlsysbook) • [💬 Join discussions](https://github.com/harvard-edge/cs249r_book/discussions) • [🌐 Visit mlsysbook.ai](https://mlsysbook.ai)** -**Made with ❤️ for AI learners worldwide** - +**Made with ❤️ for AI engineers in the making** From e6fb6b631dd814b1cdddc6f909c32687823e090c Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sun, 25 Jan 2026 13:23:17 -0500 Subject: [PATCH 2/4] Removes unused commands from CLI docs Updates the CLI documentation to reflect the current set of implemented commands. The 'dev' command no longer includes 'preflight' and 'validate' subcommands. Removes these commands from the valid commands list. --- tinytorch/tools/dev/validate_cli_docs.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tinytorch/tools/dev/validate_cli_docs.py b/tinytorch/tools/dev/validate_cli_docs.py index 5d894d4fe..4c9c260cc 100644 --- a/tinytorch/tools/dev/validate_cli_docs.py +++ b/tinytorch/tools/dev/validate_cli_docs.py @@ -36,7 +36,7 @@ VALID_COMMANDS: Dict[str, List[str]] = { "logo": [], # No subcommands "system": ["info", "health", "jupyter", "update", "logo"], "module": ["start", "view", "resume", "complete", "test", "reset", "status", "list"], - "dev": ["preflight", "export", "validate"], + "dev": ["test", "export"], "src": ["export", "test"], "package": ["reset", "nbdev"], "nbgrader": ["init", "generate", "release", "collect", "autograde", "feedback", "status", "analytics", "report"], From 2e0fcfc3e39ab2425b323975f011735f638b9192 Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sun, 25 Jan 2026 15:03:50 -0500 Subject: [PATCH 3/4] fix(paper): align paper with actual TinyTorch implementation - Rewrite progressive disclosure section (Section 4) to accurately describe how Module 01 Tensor is clean and Module 06 adds gradient features via monkey-patching (not dormant features from start) - Update code listings to match actual implementation - Update figure from dormant-active to foundation-enhanced - Remove TA_GUIDE.md references (file does not exist) - Fix export directive count from 13 modules to all 20 modules - Update GitHub repo URL to monorepo path (cs249r_book/tinytorch) --- tinytorch/paper/paper.tex | 96 ++++++++++++++++++--------------------- 1 file changed, 44 insertions(+), 52 deletions(-) diff --git a/tinytorch/paper/paper.tex b/tinytorch/paper/paper.tex index d768445fe..e3214003d 100644 --- a/tinytorch/paper/paper.tex +++ b/tinytorch/paper/paper.tex @@ -635,49 +635,45 @@ Performance targets differ from published state-of-the-art due to pure-Python co \section{Progressive Disclosure} \label{sec:progressive} -This section details how TinyTorch implements progressive disclosure: a pattern that manages cognitive load by revealing \texttt{Tensor} capabilities gradually through monkey-patching while maintaining a unified mental model. Unlike approaches that introduce separate classes or defer features entirely, students work with a single \texttt{Tensor} class throughout the curriculum. +This section details how TinyTorch implements progressive disclosure: a pattern that manages cognitive load by enhancing \texttt{Tensor} capabilities gradually through monkey-patching while maintaining a unified mental model. Unlike approaches that introduce separate classes or require students to learn new APIs mid-curriculum, students work with a single \texttt{Tensor} class throughout---its capabilities expand transparently as modules progress. \subsection{Pattern Implementation} -TinyTorch's \texttt{Tensor} class includes gradient-related attributes from Module 01, but they remain dormant until Module 06 activates them through monkey-patching (\Cref{lst:dormant-tensor,lst:activation}). \Cref{fig:progressive-timeline} visualizes this activation timeline across the curriculum. +TinyTorch's Module 01 \texttt{Tensor} class focuses exclusively on core tensor operations: data storage, arithmetic, matrix multiplication, and shape manipulation (\Cref{lst:foundation-tensor}). No gradient-related attributes exist yet---students learn tensor fundamentals without cognitive overhead from features they won't use for five more modules. In Module 06, the \texttt{enable\_autograd()} function dynamically enhances \texttt{Tensor} with gradient tracking capabilities through monkey-patching (\Cref{lst:activation}). \Cref{fig:progressive-timeline} visualizes this enhancement timeline across the curriculum. -\begin{lstlisting}[caption={\textbf{Dormant Gradient Infrastructure.} Module 01 Tensor includes \texttt{.backward()}, \texttt{.grad}, and \texttt{.requires\_grad}---visible but inactive until Module 06 activation.},label=lst:dormant-tensor,float=t] +\begin{lstlisting}[caption={\textbf{Foundation Tensor.} Module 01 Tensor focuses on core operations. No gradient infrastructure exists yet---students learn tensor fundamentals first.},label=lst:foundation-tensor,float=t] # Module 01: Foundation Tensor class Tensor: - def __init__(self, data, requires_grad=False): + def __init__(self, data): self.data = np.array(data, dtype=np.float32) self.shape = self.data.shape - # Gradient features - dormant - self.requires_grad = requires_grad - self.grad = None - self._backward = None + self.size = self.data.size + self.dtype = self.data.dtype + # No gradient features - pure data container - def backward(self, gradient=None): - """No-op until Module 06""" - pass + def memory_footprint(self): + """Systems thinking from day one""" + return self.data.nbytes def __mul__(self, other): return Tensor(self.data * other.data) \end{lstlisting} -\begin{lstlisting}[caption={\textbf{Autograd Activation.} Module 06 monkey-patches Tensor to enable gradient computation.},label=lst:activation,float=t] +\begin{lstlisting}[caption={\textbf{Autograd Enhancement.} Module 06 monkey-patches Tensor to add gradient tracking. The original \texttt{\_\_init\_\_} is wrapped to accept \texttt{requires\_grad}, and operations are enhanced to build computation graphs.},label=lst:activation,float=t] def enable_autograd(): - """Monkey-patch Tensor with gradients""" - def backward(self, gradient=None): - if gradient is None: - gradient = np.ones_like(self.data) - if self.grad is None: - self.grad = gradient - else: - self.grad += gradient - if self._backward is not None: - self._backward(gradient) + """Enhance Tensor with gradient tracking""" + _original_init = Tensor.__init__ - # Monkey-patch: replace methods - Tensor.backward = backward - print("Autograd activated!") + def gradient_aware_init(self, data, requires_grad=False): + _original_init(self, data) + self.requires_grad = requires_grad + self.grad = None -# Module 06 usage + Tensor.__init__ = gradient_aware_init + # Also patch __add__, __mul__, etc. to track gradients + print("Autograd enabled!") + +# Module 06 auto-enables on import enable_autograd() x = Tensor([3.0], requires_grad=True) y = x * x # y = 9.0 @@ -690,8 +686,8 @@ print(x.grad) # [6.0] - dy/dx = 2x \begin{tikzpicture}[ scale=0.9, every node/.style={font=\scriptsize}, - dormant/.style={rectangle, draw=gray!70, fill=gray!20, text=gray!70, minimum width=2.0cm, minimum height=0.5cm, anchor=east}, - active/.style={rectangle, draw=orange!80, fill=orange!30, text=black, font=\scriptsize\bfseries, minimum width=2.0cm, minimum height=0.5cm, anchor=west} + foundation/.style={rectangle, draw=blue!70, fill=blue!20, text=black, minimum width=2.0cm, minimum height=0.5cm, anchor=east}, + enhanced/.style={rectangle, draw=orange!80, fill=orange!30, text=black, font=\scriptsize\bfseries, minimum width=2.0cm, minimum height=0.5cm, anchor=west} ] % Timeline axis @@ -703,56 +699,52 @@ print(x.grad) # [6.0] - dy/dx = 2x \node[below, font=\tiny] at (\x, -0.3) {\texttt{M\label}}; } -% Module 06 activation boundary - thicker and highlighted +% Module 06 enhancement boundary - thicker and highlighted \draw[red!60, very thick] (6, 0) -- (6, 5.5); -\node[above, font=\scriptsize\bfseries, red!70] at (6, 5.7) {ACTIVATE}; +\node[above, font=\scriptsize\bfseries, red!70] at (6, 5.7) {ENHANCE}; -% Feature layers - dormant boxes end AT M05, active boxes start AT M05 -% Layer 1: Core features (always active - span both sides) -\node[active, minimum width=4.0cm, anchor=center] at (3.5, 1.0) {\texttt{.data}, \texttt{.shape}}; +% Feature layers +% Layer 1: Core features (always active - span entire timeline) +\node[foundation, minimum width=5.5cm, anchor=west] at (0.5, 1.0) {\texttt{.data}, \texttt{.shape}, \texttt{.memory\_footprint()}}; \node[left, font=\tiny] at (0.2, 1.0) {Core}; -% Layer 2: Gradient features - boxes meet exactly at x=6 (Module 06 line) -% .requires_grad -\node[dormant] at (6, 2.2) {\texttt{.requires\_grad}}; -\node[active] at (6, 2.2) {\texttt{.requires\_grad}}; +% Layer 2: Gradient features - only appear after Module 06 +\node[enhanced, minimum width=7.5cm] at (6, 2.2) {\texttt{.requires\_grad}}; \node[left, font=\tiny] at (0.2, 2.2) {Gradient}; % .grad -\node[dormant] at (6, 3.1) {\texttt{.grad}}; -\node[active] at (6, 3.1) {\texttt{.grad}}; +\node[enhanced, minimum width=7.5cm] at (6, 3.1) {\texttt{.grad}}; % .backward() -\node[dormant] at (6, 4.0) {\texttt{.backward()}}; -\node[active] at (6, 4.0) {\texttt{.backward()}}; +\node[enhanced, minimum width=7.5cm] at (6, 4.0) {\texttt{.backward()}}; % Annotations - positioned at top \node[align=center, font=\tiny, text width=4.5cm] at (3, 6.5) { \textbf{Modules 01--05:}\\ - Features visible but dormant\\ - \texttt{.backward()} is no-op + Clean Tensor class\\ + Focus on fundamentals }; \node[align=center, font=\tiny, text width=4.5cm] at (10, 6.5) { \textbf{Modules 06--20:}\\ - Autograd fully active\\ + Autograd enhances Tensor\\ Gradients flow automatically }; % Legend -\node[dormant, minimum width=1.0cm, minimum height=0.4cm, anchor=center] at (2.5, -1.2) {Dormant}; -\node[active, minimum width=1.0cm, minimum height=0.4cm, anchor=center] at (5.5, -1.2) {Active}; +\node[foundation, minimum width=1.2cm, minimum height=0.4cm, anchor=center] at (2.5, -1.2) {Foundation}; +\node[enhanced, minimum width=1.2cm, minimum height=0.4cm, anchor=center] at (5.5, -1.2) {Enhanced}; \end{tikzpicture} -\caption{\textbf{Progressive Disclosure.} Runtime feature activation manages cognitive load. From Module 01, students see the complete Tensor API including gradient methods (\texttt{.backward()}, \texttt{.grad}, \texttt{.requires\_grad}), but these features remain dormant (gray, dashed). In Module 06, runtime enhancement activates full autograd functionality (orange, solid) without breaking earlier code. Three learning benefits: (1) students learn the complete API early, avoiding interface surprise later; (2) Module 01 code continues working unchanged when autograd activates (forward compatibility); (3) visible but inactive features create curiosity-driven questions motivating curriculum progression.} +\caption{\textbf{Progressive Enhancement.} Runtime capability expansion manages cognitive load. Modules 01--05 use a clean Tensor class focused on fundamentals (blue): data storage, arithmetic, matrix operations, and memory profiling. In Module 06, \texttt{enable\_autograd()} enhances Tensor with gradient tracking (orange): \texttt{requires\_grad}, \texttt{.grad}, and \texttt{.backward()} are dynamically added via monkey-patching. Three learning benefits: (1) students master fundamentals without distraction from unused features; (2) Module 01--05 code continues working unchanged after enhancement (backward compatibility); (3) the enhancement moment in Module 06 becomes a concrete ``aha'' experience as familiar tensors gain new capabilities.} \label{fig:progressive-timeline} \end{figure*} \subsection{Pedagogical Justification} -Progressive disclosure is grounded in cognitive load theory~\citep{sweller1988cognitive} and threshold concept pedagogy~\citep{meyer2003threshold}. The cognitive load hypothesis (early API familiarity reduces future load when features activate) competes with potential split-attention effects from visible but dormant features. Autograd represents a threshold concept---transformative and troublesome---made visible early (dormant) but activatable when students are cognitively ready. Empirical measurement planned for Fall 2025 (\Cref{sec:future-work}) will quantify the net cognitive load impact. +Progressive enhancement is grounded in cognitive load theory~\citep{sweller1988cognitive} and threshold concept pedagogy~\citep{meyer2003threshold}. By deferring gradient infrastructure until Module 06, students focus entirely on tensor fundamentals without the cognitive overhead of unused attributes. Autograd represents a threshold concept---transformative and troublesome---introduced only when students have mastered the prerequisites (forward pass, loss computation) needed to appreciate it. The enhancement moment creates a concrete learning experience: tensors students already understand suddenly gain powerful new capabilities. -\textbf{Implementation Choice: Monkey-Patching vs. Inheritance.} Alternative designs include inheritance (\texttt{TensorV1}/\texttt{TensorV2}) or composition. We chose monkey-patching because it mirrors PyTorch 0.4's Variable-Tensor merger via runtime consolidation. The software engineering trade-off (global state modification) is explicitly discussed in Module 06's reflection questions. +\textbf{Implementation Choice: Monkey-Patching vs. Inheritance.} Alternative designs include inheritance (\texttt{TensorV1}/\texttt{TensorV2}) or composition. We chose monkey-patching because it maintains a single \texttt{Tensor} class throughout the curriculum, mirrors PyTorch 0.4's Variable-Tensor merger via runtime consolidation, and creates a memorable ``activation moment'' when capabilities appear. The software engineering trade-off (global state modification) is explicitly discussed in Module 06's reflection questions. \subsection{Production Framework Alignment} @@ -990,12 +982,12 @@ from tinytorch.nn import Transformer, Embedding, Attention This design bridges educational and professional contexts. Students aren't ``solving exercises''---they're building a framework they could ship. The package structure reinforces systems thinking: understanding how \texttt{torch.nn.Conv2d} relates to \texttt{torch.Tensor} requires grasping module organization, not just individual algorithms. More importantly, students experience the satisfaction of watching their framework grow from a single \texttt{Tensor} class to a complete system capable of training transformers: each module completion adds new capabilities they can immediately use. -TinyTorch implements a literate programming workflow where source files serve dual purposes: executable Python code and educational documentation. Export happens via nbdev~\citep{howard2020fastai} directives (\texttt{\#| export}) embedded in module source files, enabling automatic package generation via \texttt{nbdev\_export}. TinyTorch modules are developed as Python source files using Jupytext percent format (\texttt{src/*/*.py}), with Jupyter notebooks generated for student distribution. The build system maintains single source of truth: developers edit \texttt{src/*/*.py} literate programming files containing both code and documentation, nbdev exports marked functions to \texttt{tinytorch/*} package structure (gitignored as generated artifact), and Jupytext converts source files to student-facing \texttt{.ipynb} notebooks. Thirteen modules (01, 05--09, 11--12, 15--19) currently use \texttt{\#| export} directives for automatic package generation, enabling students to import from \texttt{tinytorch.core}, \texttt{tinytorch.nn}, \texttt{tinytorch.optim}, and \texttt{tinytorch.profiling} as they complete modules. This resolves the tension between version-controllable development (Python files enable proper diffs, merges, and code review) and notebook-based learning (students work in familiar Jupyter environments). Educators building similar curricula can adopt this pattern: maintain source-of-truth in version-controlled Python files while delivering interactive notebooks to students. +TinyTorch implements a literate programming workflow where source files serve dual purposes: executable Python code and educational documentation. Export happens via nbdev~\citep{howard2020fastai} directives (\texttt{\#| export}) embedded in module source files, enabling automatic package generation via \texttt{nbdev\_export}. TinyTorch modules are developed as Python source files using Jupytext percent format (\texttt{src/*/*.py}), with Jupyter notebooks generated for student distribution. The build system maintains single source of truth: developers edit \texttt{src/*/*.py} literate programming files containing both code and documentation, nbdev exports marked functions to \texttt{tinytorch/*} package structure (gitignored as generated artifact), and Jupytext converts source files to student-facing \texttt{.ipynb} notebooks. All 20 modules use \texttt{\#| export} directives for automatic package generation, enabling students to import from \texttt{tinytorch.core} and \texttt{tinytorch.perf} as they complete modules. This resolves the tension between version-controllable development (Python files enable proper diffs, merges, and code review) and notebook-based learning (students work in familiar Jupyter environments). Educators building similar curricula can adopt this pattern: maintain source-of-truth in version-controlled Python files while delivering interactive notebooks to students. \subsection{Open Source Infrastructure} \label{subsec:opensource} -TinyTorch is released as open source to enable community adoption and evolution.\footnote{Code released under MIT License, curriculum materials under Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA). Repository: \url{https://github.com/harvard-edge/TinyTorch}} The repository includes instructor resources: \texttt{CONTRIBUTING.md} (guidelines for bug reports and curriculum improvements), \texttt{INSTRUCTOR.md} (30-minute setup guide, grading rubrics, common student errors), and \texttt{TA\_GUIDE.md} (teaching assistant preparation and debugging strategies). +TinyTorch is released as open source to enable community adoption and evolution.\footnote{Code released under MIT License, curriculum materials under Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA). Repository: \url{https://github.com/harvard-edge/cs249r_book/tree/main/tinytorch}} The repository includes instructor resources: \texttt{CONTRIBUTING.md} (guidelines for bug reports and curriculum improvements) and \texttt{INSTRUCTOR.md} (setup guide, grading rubrics, common student errors, and TA preparation strategies). \textbf{Maintenance Commitment}: The author commits to bug fixes and dependency updates through 2027, community pull request review within 2 weeks, and annual releases incorporating educator feedback. Community governance transition (2026--2027) will establish an educator advisory board and document succession planning to ensure long-term sustainability beyond single-author maintenance. @@ -1006,7 +998,7 @@ TinyTorch is released as open source to enable community adoption and evolution. Effective deployment requires structured TA support beyond instructor guidance. -\textbf{TA Preparation}: TAs should develop deep familiarity with critical modules where students commonly struggle—Modules 06 (Autograd), 09 (CNNs), and 13 (Transformers)—by completing these modules themselves and intentionally introducing bugs to understand common error patterns. The repository provides \texttt{TA\_GUIDE.md} documenting frequent student errors (gradient shape mismatches, disconnected computational graphs, broadcasting failures) and debugging strategies. +\textbf{TA Preparation}: TAs should develop deep familiarity with critical modules where students commonly struggle—Modules 06 (Autograd), 09 (CNNs), and 13 (Transformers)—by completing these modules themselves and intentionally introducing bugs to understand common error patterns. The \texttt{INSTRUCTOR.md} file documents frequent student errors (gradient shape mismatches, disconnected computational graphs, broadcasting failures) and debugging strategies. \textbf{Office Hour Demand Patterns}: Student help requests are expected to cluster around conceptually challenging modules, with autograd (Module 06) likely generating higher office hour demand than foundation modules. Instructors should anticipate demand spikes by scheduling additional TA capacity during critical modules, providing pre-recorded debugging walkthroughs, and establishing async support channels (discussion forums with guaranteed response times). From d9a48dd3c9d121275ade2f83a20982db07521c5f Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sun, 25 Jan 2026 15:04:03 -0500 Subject: [PATCH 4/4] feat: add slide deck cards to modules 08 and 20 - Update 08_training and 20_capstone with 2x2 card layout - Add slide deck download links to GitHub release - Standardize card order and colors across all modules --- tinytorch/src/08_training/ABOUT.md | 21 ++++++++++++++------- tinytorch/src/20_capstone/ABOUT.md | 21 ++++++++++++++------- 2 files changed, 28 insertions(+), 14 deletions(-) diff --git a/tinytorch/src/08_training/ABOUT.md b/tinytorch/src/08_training/ABOUT.md index 923355807..aee9b250d 100644 --- a/tinytorch/src/08_training/ABOUT.md +++ b/tinytorch/src/08_training/ABOUT.md @@ -9,9 +9,18 @@ By completing Modules 01-07, you've built all the fundamental components: tensor ::: `````{only} html -````{grid} 1 2 3 3 +````{grid} 1 2 2 2 :gutter: 3 +```{grid-item-card} 🎧 Audio Overview + +Listen to an AI-generated overview. + + +``` + ```{grid-item-card} 🚀 Launch Binder Run interactively in your browser. @@ -23,16 +32,14 @@ Run interactively in your browser. Browse the source code on GitHub. -View on GitHub → +View on GitHub → ``` -```{grid-item-card} 🎧 Audio Overview +```{grid-item-card} 📊 Slide Deck -Listen to an AI-generated overview. +Download the lecture PDF. - +Download PDF → ``` ```` diff --git a/tinytorch/src/20_capstone/ABOUT.md b/tinytorch/src/20_capstone/ABOUT.md index a8085db22..d004150ca 100644 --- a/tinytorch/src/20_capstone/ABOUT.md +++ b/tinytorch/src/20_capstone/ABOUT.md @@ -14,9 +14,18 @@ The core benchmarking functionality (Parts 1-4) works with just Modules 01-13 an ::: `````{only} html -````{grid} 1 2 3 3 +````{grid} 1 2 2 2 :gutter: 3 +```{grid-item-card} 🎧 Audio Overview + +Listen to an AI-generated overview. + + +``` + ```{grid-item-card} 🚀 Launch Binder Run interactively in your browser. @@ -28,16 +37,14 @@ Run interactively in your browser. Browse the source code on GitHub. -View on GitHub → +View on GitHub → ``` -```{grid-item-card} 🎧 Audio Overview +```{grid-item-card} 📊 Slide Deck -Listen to an AI-generated overview. +Download the lecture PDF. - +Download PDF → ``` ````