Add tier flexibility explanation and fix critical repetitions

TIER FLEXIBILITY ENHANCEMENT:
Strengthened 'Selective implementation' paragraph to explicitly enumerate three
curriculum configurations and explain WHY they matter:

1. Foundation only (M01-07): Introductory ML systems courses, capstone projects
   - Focus on framework internals (tensors, autograd, training loops)

2. Foundation + Architecture (M01-13): Comprehensive ML systems courses
   - Extend to modern deep learning (CNNs, transformers)

3. Optimization focus (M14-19 only): Production ML, edge deployment, TinyML
   - Import pre-built tinytorch.nn/optim, implement only optimization techniques
   - Addresses key limitation: quantization students shouldn't rebuild autograd

Added pedagogical justification:
- Systems-heavy courses build Foundation→Architecture
- Optimization-focused courses skip to production concerns with pre-built deps
- Enables matching curriculum scope to course objectives within semester constraints

CRITICAL REPETITION FIXES (per research coordinator review):

1. Introduction line 307 (systems-first): Removed detailed explanation, added
   forward reference to Section 4 to avoid pre-stating content

2. Introduction line 307 (progressive disclosure): Simplified to brief mention
   with forward reference, removed detailed mechanics

3. Contribution #2 (progressive disclosure): Condensed description, removed
   redundant 'cognitive load challenge' phrase already covered in line 307

These changes follow pattern: Introduction = brief preview + forward reference,
Dedicated sections = full treatment. Eliminates repetition while maintaining flow.

Research coordinator identified 11 repetition categories; addressed 3 critical ones.
Others are either intentional (Adam optimizer, 1958-2024 span as thematic elements)
or acceptable (table vs detailed comparison for MiniTorch).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Vijay Janapa Reddi
2025-11-19 11:37:32 -05:00
parent 94f792a2d6
commit ccbbda270f

View File

@@ -304,7 +304,7 @@ class Adam:
\label{fig:code-comparison}
\end{figure*}
The curriculum addresses three fundamental questions. First, can students learn systems thinking \emph{alongside} ML fundamentals rather than in separate electives? TinyTorch demonstrates that memory profiling, computational complexity analysis, and performance reasoning integrate naturally when students build components from scratch—Module 01 introduces tensor memory footprints before matrix operations, making systems awareness foundational rather than advanced. Second, how do we manage cognitive load when teaching both algorithms and implementation? Progressive disclosure (\Cref{sec:progressive}) solves this through runtime feature activation: gradient tracking exists but stays dormant in Modules 01-04, activating only when Module 05 introduces automatic differentiation. Third, can bottom-up implementation compete with top-down API usage for learning efficiency? Historical milestone validation provides evidence: students recreate 70 years of ML breakthroughs (1958 Perceptron $\rightarrow$ 2024 optimized transformers) using exclusively their own code, demonstrating that implementations work on real tasks.
The curriculum addresses three fundamental questions. First, can students learn systems thinking \emph{alongside} ML fundamentals rather than in separate electives? TinyTorch embeds memory profiling and complexity analysis from Module 01 onwards (\Cref{sec:systems-first} details this systems-first integration). Second, how do we manage cognitive load when teaching both algorithms and implementation? Progressive disclosure (\Cref{sec:progressive}) manages this through runtime feature activation, allowing gradual complexity revelation. Third, can bottom-up implementation compete with top-down API usage for learning efficiency? Historical milestone validation (\Cref{subsec:milestones}) provides evidence: students recreate 70 years of ML breakthroughs using exclusively their own code.
The curriculum follows the compiler course model~\citep{aho2006compilers}: students build a complete system module-by-module, experiencing how components integrate through direct implementation. \Cref{fig:module-flow} illustrates the dependency structure—tensors (Module 01) enable activations (02) and layers (03), which feed into autograd (05), which powers optimizers (06) and training (07). This incremental construction mirrors how compiler courses connect lexical analysis to parsing to code generation, creating systems thinking through component integration. Each completed module becomes immediately usable: after Module 03, students can build neural networks; after Module 05, automatic differentiation enables training; after Module 13, transformers support language modeling.
@@ -395,7 +395,7 @@ This paper makes three primary contributions:
\begin{enumerate}
\item \textbf{Systems-First Curriculum Architecture}: A 20-module learning path integrating memory profiling, computational complexity, and performance analysis from Module 01 onwards, replacing traditional algorithm-systems separation. Students discover systems constraints through direct measurement (Adam's 2$\times$ optimizer state overhead, Conv2d's 109$\times$ parameter efficiency, KV caching's $O(n^2) \rightarrow O(n)$ transformation) rather than abstract instruction (\Cref{sec:curriculum,sec:systems}). This architecture directly addresses the workforce gap by making tacit systems knowledge explicit through hands-on implementation. Grounded in situated cognition~\citep{lave1991situated} and constructionism~\citep{papert1980mindstorms}, with systems thinking pedagogy informed by established frameworks~\citep{meadows2008thinking}.
\item \textbf{Progressive Disclosure Pattern}: To make systems-first learning tractable, we introduce a pedagogical technique using monkey-patching (runtime method replacement) to reveal \texttt{Tensor} complexity gradually while maintaining a unified mental model. Dormant gradient features exist from Module 01 but activate in Module 05, enabling forward-compatible code and teaching how frameworks like PyTorch evolved (Variable/Tensor merger) (\Cref{sec:progressive}). This pattern is designed to address the cognitive load challenge inherent in teaching both algorithms and systems simultaneously. Grounded in cognitive load theory~\citep{sweller1988cognitive} and cognitive apprenticeship~\citep{collins1989cognitive}.
\item \textbf{Progressive Disclosure Pattern}: To make systems-first learning tractable, we introduce a pedagogical technique using monkey-patching (runtime method replacement) to reveal \texttt{Tensor} complexity gradually while maintaining a unified mental model (\Cref{sec:progressive}). This enables forward-compatible code (Module 01 implementations don't break when autograd activates) and teaches framework evolution (PyTorch's Variable/Tensor merger). Grounded in cognitive load theory~\citep{sweller1988cognitive} and cognitive apprenticeship~\citep{collins1989cognitive}.
\item \textbf{Open Educational Infrastructure}: Both innovations are validated through a complete open-source curriculum with NBGrader assessment infrastructure~\citep{blank2019nbgrader}, three integration models (self-paced learning, institutional courses, team onboarding), historical milestone validation (1958 Perceptron through 2024 optimized transformers), and PyTorch-inspired package architecture. This infrastructure enables community adoption, curricular adaptation, and empirical research into ML systems pedagogy effectiveness (\Cref{sec:curriculum,sec:deployment,sec:discussion}).
\end{enumerate}
@@ -1025,7 +1025,7 @@ Beyond the deployment models described in \Cref{subsec:integration}, TinyTorch's
\textbf{Tier-based partitioning enables distributed cognitive load}: The three-tier structure (Foundation, Architecture, Optimization) creates natural stopping points where students achieve mastery before advancing. Completing Foundation (Modules 01--07) develops core systems understanding (tensors, autograd, training loops) sufficient for simple network training. Architecture tier (Modules 08--13) builds on this foundation to teach modern deep learning without re-teaching basics. This vertical partitioning follows cognitive load theory~\citep{sweller1988cognitive}: students consolidate foundational knowledge before encountering architectural complexity, preventing the overwhelm that occurs when teaching CNNs and transformers simultaneously with basic autograd mechanics. Multi-semester deployments exploit this structure by aligning tiers with academic terms, enabling depth over breadth.
\textbf{Selective implementation leverages package accumulation}: TinyTorch's progressive package structure (\Cref{subsec:package}) enables "build what you're learning, import what you need" pedagogy. Students focusing on specific topics---optimization techniques, attention mechanisms, CNN architectures---can import pre-built modules covering prerequisites while implementing target concepts. This addresses a key limitation of monolithic projects: students interested in quantization shouldn't need to re-implement autograd first. Selective implementation maintains hands-on learning for target concepts while respecting time constraints through strategic dependency use. The pedagogical tradeoff is clear: complete implementation builds comprehensive understanding, while selective implementation enables targeted depth within limited timeframes.
\textbf{Selective implementation leverages package accumulation}: TinyTorch's progressive package structure (\Cref{subsec:package}) enables "build what you're learning, import what you need" pedagogy, supporting three distinct curriculum configurations: (1) \textbf{Foundation only} (Modules 01--07): Build core systems (tensors, autograd, training loops) from scratch---ideal for introductory ML systems courses or undergraduate capstone projects focusing on framework internals. (2) \textbf{Foundation + Architecture} (Modules 01--13): Extend to modern deep learning by implementing CNNs and transformers---suitable for comprehensive ML systems courses or graduate-level deep learning seminars. (3) \textbf{Optimization focus} (Modules 14--19 only): Import pre-built \texttt{tinytorch.nn} and \texttt{tinytorch.optim} packages, implement only optimization techniques (quantization, pruning, distillation)---targets production ML courses, edge deployment seminars, or TinyML workshops where students already understand framework basics but need systems optimization depth. This addresses a key limitation of monolithic projects: students interested in quantization shouldn't need to re-implement autograd first. These configurations enable instructors to match curriculum scope to course objectives---systems-heavy courses build from Foundation through Architecture, while optimization-focused courses skip to production concerns using pre-built dependencies. The pedagogical tradeoff is intentional: complete implementation builds comprehensive understanding, while selective implementation enables targeted depth within semester constraints.
\textbf{Hybrid integration bridges application and internals}: TinyTorch modules can augment PyTorch-first courses by revealing framework internals alongside application. Students training CNNs with PyTorch might implement TinyTorch's convolution module to understand what \texttt{torch.nn.Conv2d} does internally, then return to PyTorch for projects. This parallel construction addresses a fundamental pedagogical tension: application-first courses teach ML usage quickly but risk treating frameworks as black boxes; implementation-first courses teach internals deeply but delay practical application. Hybrid approaches enable both: students learn PyTorch for projects while building TinyTorch for understanding. Critical integration points include Module 05 (autograd, demystifying \texttt{loss.backward()}), Module 09 (convolution, explaining kernel complexity), and Module 12 (attention, revealing transformer internals).