Replaced placeholder numbers with actual measurements from benchmarking
script. Numbers show TinyTorch's pure Python implementations are
100-10,000× slower than PyTorch, demonstrating the pedagogical value
of experiencing performance reality.
Real benchmark results:
- MatMul (1K×1K): 1.0s vs 0.9ms = 1,090× slower
- Conv2d (CIFAR batch): 97s vs 10ms = 10,017× slower
- Softmax (10K elem): 6ms vs 0.05ms = 134× slower
Methodology:
- MatMul: Double-loop with numpy dot for inner loop
- Conv2d: Pure 7-nested-loop implementation as shown in paper
- Softmax: Pure Python loops for max, exp, sum, normalize
Created benchmark_quick.py script that measures actual performance
using implementations that match what students write in the curriculum.
Conv2d uses single-image timing extrapolated to full batch for speed.
Updated paper text to reference actual measured values (97s vs 10ms)
instead of placeholders, strengthening the experiencing performance
reality pedagogical argument.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>