Remove Design Insights subsection from Discussion

After review, determined that Design Insights section was repetitive and didn't
add genuine value beyond what's already covered in:
- Section 2: Related Work (positioning and comparison)
- Sections 3-5: Pedagogical patterns (progressive disclosure, systems-first, etc.)
- Section 7: Deployment models

Discussion section now consists solely of:
- Limitations and Scope Boundaries (organized by categories)

This cleaner structure avoids repetition and keeps the Discussion focused on
acknowledging scope boundaries through trade-off framing.

Paper compiles successfully (23 pages, down from 24).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Vijay Janapa Reddi
2025-11-19 09:52:43 -05:00
parent e80a6abb73
commit fb9c3131b9
2 changed files with 0 additions and 18 deletions

Binary file not shown.

View File

@@ -1003,24 +1003,6 @@ TinyTorch embraces productive failure \citep{kapur2008productive}---learning thr
\section{Discussion and Limitations}
\label{sec:discussion}
Building TinyTorch revealed insights about teaching ML systems from first principles. This section opens by reflecting on key design lessons learned, then acknowledges scope boundaries honestly through trade-off framing that connects limitations to pedagogical rationale.
\subsection{Design Insights}
\label{subsec:design-insights}
Implementing TinyTorch's 20-module curriculum yielded four key pedagogical insights about teaching framework internals through hands-on construction.
\noindent\textbf{Progressive Disclosure Manages Complexity Through Gradual Feature Activation.} The monkey-patching approach---where Tensor.backward() exists but remains dormant in Modules 01--04, activating only when Module 05 introduces computational graphs---proved effective for cognitive load management. Students work with a single unified Tensor class throughout the curriculum rather than replacing implementations mid-semester. Early modules benefit from this simplicity: Module 01 students profile memory without considering gradient tracking overhead; Module 03 students build layers without autograd complexity. When Module 05 activates backward passes, the existing code continues working while gaining new capabilities. This gradual revelation mirrors how production frameworks evolved historically---PyTorch added features over years; TinyTorch compresses that evolution into weeks. The key insight: \textbf{dormant features cost nothing cognitively until activated}, enabling complex final systems built from simple foundations.
\noindent\textbf{Systems-First Integration Prevents ``Algorithms Without Costs'' Learning.} Starting Module 01 with memory profiling (\texttt{tensor.nbytes}) before introducing operations established that every ML component has measurable systems costs. Students internalize ``measure first'' methodology: before implementing convolution (Module 09), they calculate expected memory footprint (output channels $\times$ kernel size $\times$ input channels); before training transformers (Module 13), they predict attention's $O(N^2)$ memory growth. This approach prevents common student misconceptions like ``larger models are always better'' (ignoring memory constraints) or ``Adam is always superior to SGD'' (ignoring 4$\times$ memory multiplier). The key insight: \textbf{systems awareness as foundational concept, not advanced topic}, changes how students approach all subsequent ML engineering decisions. They graduate asking ``Can this fit in memory?'' before ``Does this achieve 0.1\% better accuracy?''
\noindent\textbf{Historical Milestones Validate Correctness While Teaching ML Evolution.} Recreating the 1958 Perceptron, 1986 XOR solution (two-layer networks), 2012 CNN revolution (AlexNet architecture), and 2017 Transformer breakthrough provided more than engagement---these milestones served as \textbf{pedagogical checkpoints with historical grounding}. Students understand why each innovation mattered: Perceptron's limitations motivated multilayer networks, CNNs' parameter efficiency (896 parameters vs. 98,336 for equivalent dense layer) enabled image processing, attention's parallelizability improved over sequential RNNs. Correctness validation comes from reproducing published results: if your autograd implementation trains XOR successfully, it likely works correctly; if your attention implementation matches Transformer paper benchmarks, you've understood the architecture. The key insight: \textbf{historical progression provides both motivation and validation}, making abstract implementations concrete through reproducing breakthrough results.
\noindent\textbf{Build-Use-Reflect Cycle Enables Immediate Application and Debugging.} Each module's three-part structure---build implementation, use it immediately in integration tests, reflect on systems implications---proved critical for learning retention. Module 05 students don't just implement backpropagation; they immediately use it to train Module 03's networks, then profile memory growth as model depth increases. This rapid feedback loop exposes implementation bugs quickly (``My gradients explode in deep networks—I must have a scaling issue'') while reinforcing systems thinking (``Deeper networks require more activation memory for backward pass''). The key insight: \textbf{implementation becomes meaningful through immediate use}, not through isolated coding exercises. Students see their Tensor class power real training loops, their optimizers converge real models, their transformers generate real text---turning abstract code into working systems.
\subsection{Limitations and Scope Boundaries}
\label{subsec:limitations}
TinyTorch prioritizes framework internals understanding over production completeness, creating three categories of limitations that reflect deliberate pedagogical trade-offs rather than technical barriers.
\noindent\textbf{Production Systems Beyond Scope.} TinyTorch teaches framework internals as foundation for advanced topics, not as replacement. The CPU-only design omits GPU programming (CUDA kernels, tensor cores, mixed precision), distributed training (data/model parallelism, gradient synchronization), and production deployment (model serving, compilation, MLOps tooling). These topics require substantial complexity---parallel programming, hardware knowledge, deployment infrastructure---that would shift focus from framework understanding. Complete ML engineer preparation requires TinyTorch (internals foundation) followed by PyTorch tutorials (GPU acceleration), distributed training courses (multi-node systems), and production experience. The CPU-only scope offers three pedagogical benefits: \textbf{accessibility} (works on modest hardware in regions with limited cloud access), \textbf{reproducibility} (consistent performance across institutions), and \textbf{pedagogical focus} (internals learning not confounded with hardware optimization).