Address student feedback on abstract and intro

1. Clarify progressive disclosure in abstract: - Changed from "activates dormant tensor features through monkey-patching" - To "gradually reveals complexity: tensor gradient features exist from Module 01 but activate in Module 05, managing cognitive load" 2. Add variety to 'why' examples in intro: - Changed second Adam example to Conv2d 109x parameter efficiency - Intro now covers: Adam optimizer state, attention O(N²), KV caching, and Conv2d efficiency (four distinct examples) The 2x vs 4x Adam figures were actually consistent (2x optimizer state, 4x total training memory) but appeared confusing when repeated. Now varied. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-08 00:07:34 -05:00 · 2025-11-19 20:59:33 -05:00
parent e323cc39a4
commit de1ab9db25
1 changed files with 2 additions and 2 deletions
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -182,7 +182,7 @@

 % Abstract - REVISED: Curriculum design focus
 \begin{abstract}
-Machine learning education typically teaches framework usage without exposing internals, leaving students unable to debug gradient flows, profile memory bottlenecks, or understand optimization tradeoffs. TinyTorch addresses this gap through a build-from-scratch curriculum where students implement PyTorch's core components—tensors, autograd, optimizers, and neural networks—to gain framework transparency. We present the design and implementation of three pedagogical design patterns for teaching ML as systems engineering. \textbf{Progressive disclosure} activates dormant tensor features across modules through monkey-patching, modeling how frameworks evolve from separate abstractions to unified interfaces. \textbf{Systems-first curriculum} embeds memory profiling and complexity analysis from the start rather than treating them as advanced topics. \textbf{Historical milestone validation} recreates nearly 70 years of ML breakthroughs (1958 Perceptron through modern transformers) using exclusively student-implemented code to validate correctness. These patterns are grounded in established learning theory (situated cognition, cognitive load theory, cognitive apprenticeship) but represent testable design hypotheses whose learning outcomes require empirical validation. The 20-module curriculum (estimated 60--80 hours) provides complete open-source infrastructure for institutional adoption or self-paced learning at \texttt{tinytorch.ai}.
+Machine learning education typically teaches framework usage without exposing internals, leaving students unable to debug gradient flows, profile memory bottlenecks, or understand optimization tradeoffs. TinyTorch addresses this gap through a build-from-scratch curriculum where students implement PyTorch's core components—tensors, autograd, optimizers, and neural networks—to gain framework transparency. We present the design and implementation of three pedagogical design patterns for teaching ML as systems engineering. \textbf{Progressive disclosure} gradually reveals complexity: tensor gradient features exist from Module 01 but activate in Module 05, managing cognitive load while maintaining a unified mental model. \textbf{Systems-first curriculum} embeds memory profiling and complexity analysis from the start rather than treating them as advanced topics. \textbf{Historical milestone validation} recreates nearly 70 years of ML breakthroughs (1958 Perceptron through modern transformers) using exclusively student-implemented code to validate correctness. These patterns are grounded in established learning theory (situated cognition, cognitive load theory, cognitive apprenticeship) but represent testable design hypotheses whose learning outcomes require empirical validation. The 20-module curriculum (estimated 60--80 hours) provides complete open-source infrastructure for institutional adoption or self-paced learning at \texttt{tinytorch.ai}.
 \end{abstract}


@@ -196,7 +196,7 @@ Unlike algorithmic ML—where automated tools increasingly handle model architec

 Current ML education creates this gap by separating algorithms from systems. Students learn to implement gradient descent without measuring memory consumption, build attention mechanisms without profiling $O(N^2)$ costs, and train models without understanding optimizer state overhead. Introductory courses use high-level APIs (PyTorch, Keras) that abstract away implementation details, while advanced electives teach systems concepts (memory management, performance optimization) in isolation from ML frameworks. This pedagogical divide produces graduates who can \emph{use} \texttt{loss.backward()} but cannot explain how computational graphs enable reverse-mode differentiation, or who understand transformers mathematically but miss that KV caching trades $O(N^2)$ memory for $O(N)$ recomputation.

-We present TinyTorch, a 20-module curriculum where students build PyTorch's core components from scratch using only NumPy: tensors, automatic differentiation, optimizers, CNNs, transformers, and production optimization techniques. Students transition from framework \emph{users} to framework \emph{engineers} by implementing the internals that high-level APIs deliberately hide. As a hands-on companion to the \emph{Machine Learning Systems} textbook~\citep{reddi2024mlsysbook}, TinyTorch transforms tacit systems knowledge into explicit pedagogy—students don't just learn \emph{that} Adam requires 4$\times$ training memory, they \emph{implement} momentum and variance buffers and \emph{measure} the footprint directly through profiling code they wrote. \Cref{fig:code-comparison} contrasts this bottom-up approach with traditional top-down API usage.
+We present TinyTorch, a 20-module curriculum where students build PyTorch's core components from scratch using only NumPy: tensors, automatic differentiation, optimizers, CNNs, transformers, and production optimization techniques. Students transition from framework \emph{users} to framework \emph{engineers} by implementing the internals that high-level APIs deliberately hide. As a hands-on companion to the \emph{Machine Learning Systems} textbook~\citep{reddi2024mlsysbook}, TinyTorch transforms tacit systems knowledge into explicit pedagogy: students don't just learn \emph{that} Conv2d achieves 109$\times$ parameter efficiency over dense layers, they \emph{implement} sliding window convolution and \emph{measure} the difference directly through profiling code they wrote. \Cref{fig:code-comparison} contrasts this bottom-up approach with traditional top-down API usage.

 \begin{figure*}[t]
 \centering