- Add FLOPs counting and throughput to baseline profile
- Use Benchmark class from Module 19 for standardized measurements
- Show detailed latency stats: mean, std, min/max, P95
- Fix missing statistics import in benchmark.py
- Use correct BenchmarkResult attribute names
- Showcase Modules 14, 15, 16, 19 working together
- Fix import names: ProfilerComplete->Profiler, QuantizationComplete->Quantizer, CompressionComplete->Compressor
- Add missing Embedding import to transformer.py
- Update optimization olympics table to show baseline acc, new acc, and delta with +/- signs
- Milestones 01, 02, 05, 06 all working
- Add subscribe-modal.js with elegant popup form
- Update top bar: fire-themed dark design (56px), orange accent
- Subscribe button triggers modal instead of navigating away
- Modal shows MLSysBook + TinyTorch branding connection
- Form submits to mlsysbook newsletter with tinytorch-website tag
- Orange Subscribe button matches TinyTorch fire theme
- Responsive design with dark mode support
- Added create_causal_mask() helper function to src/13_transformers
- Updated tinytorch/__init__.py to import from core.transformer
- Deleted stale tinytorch/models/transformer.py (now in core/)
- Updated TinyTalks to use the new import path
The create_causal_mask function is essential for autoregressive
generation - it ensures each position only attends to past tokens.
Key fixes:
- Added causal mask so model can only attend to past tokens
- This matches training (teacher forcing) with generation (autoregressive)
- Used simpler words with distinct patterns for reliable completion
The .data access issue was a red herring - the real problem was
that without causal masking, the model sees future tokens during
training but not during generation. Causal mask fixes this.
Identified critical issue: Tensor indexing/slicing breaks gradient graph.
Root cause:
- Tensor.__getitem__ creates new Tensor without backward connection
- Tensor(x.data...) pattern disconnects from graph
- This is why attention_proof works (reshapes, doesn't slice)
Diagnostic tests reveal:
- Individual components (embedding, attention) pass gradient tests
- Full forward-backward fails when using .data access
- Loss doesn't decrease due to broken gradient chain
TODO: Fix in src/01_tensor:
- Make __getitem__ maintain computation graph
- Add warning when .data is used in grad-breaking context
- Consider adding .detach() method for explicit disconnection
MLPerf Milestone 06 now has two parts:
- 01_optimization_olympics.py: Profiling + Quantization + Pruning on MLP
- 02_generation_speedup.py: KV Caching for 10× faster Transformer
Milestone system changes:
- Support 'scripts' array for multi-part milestones
- Run all parts sequentially with progress tracking
- Show all parts in milestone info and banner
- Success message lists all completed parts
Removed placeholder scripts:
- 01_baseline_profile.py (redundant)
- 02_compression.py (merged into 01)
- 03_generation_opts.py (replaced by 02)
- Networks library is specific to Milestone 06 (optimization focus)
- Milestones 01-05 keep their 'YOUR Module X' inline experience
- Updated header to clarify these are pre-built for optimization
- Created milestones/networks.py with reusable network definitions
- Perceptron (Milestone 01), DigitMLP (03), SimpleCNN (04), MinimalTransformer (05)
- MLPerf milestone now imports networks from previous milestones
- All networks tested and verified working
- Enables optimization of the same networks students built earlier
- Uses Profiler class from Module 14
- Uses QuantizationComplete from Module 15
- Uses CompressionComplete from Module 16
- Clearly shows 'YOUR implementation' for each step
- Builds on SimpleMLP from earlier milestones
- Shows how all modules work together
MLPerf changes:
- Show quantization and pruning individually (not combined)
- Added 'Challenge: Combine Both' as future competition
- Clearer output showing each technique's impact
Progress sync:
- Added _offer_progress_sync() to milestone completion
- Uses centralized SubmissionHandler (same as module completion)
- Prompts user to sync achievement after milestone success
- Single endpoint for all progress updates
- Enhanced attention proof to use A-Z letters instead of numbers
- Shows MCYWUH → HUWYCM instead of [1,2,3] → [3,2,1]
- More intuitive and fun for students
- Removed quickdemo, generation, dialogue scripts (too slow/gibberish)
- Enhanced CIFAR-10 CNN with BatchNorm2d for stable training
- Added RandomHorizontalFlip and RandomCrop augmentation transforms
- Improved training accuracy from 65%+ to 70%+ with modern architecture
- Updated demo tapes with opening comments for clarity
- Regenerated welcome GIF, removed outdated demo GIFs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- The canonical attention test from 'Attention is All You Need' paper
- Proves attention mechanism works by reversing sequences
- Impossible without cross-position attention (no shortcuts!)
- Trains in 30 seconds with 95%+ accuracy target
- Includes full educational context and ASCII architecture diagram
- Student-friendly with rich console output and progress tracking
- Should be run BEFORE complex Q&A tasks to verify attention works
Why this matters:
- Provides instant proof that attention computes relationships
- Fast feedback loop (30s vs 5min for Q&A)
- Binary success metric (either works or doesn't)
- From the original transformer paper validation tasks
- Perfect for debugging attention implementation
- The canonical attention test from 'Attention is All You Need' paper
- Proves attention mechanism works by reversing sequences
- Impossible without cross-position attention (no shortcuts!)
- Trains in 30 seconds with 95%+ accuracy target
- Includes full educational context and ASCII architecture diagram
- Student-friendly with rich console output and progress tracking
- Should be run BEFORE complex Q&A tasks to verify attention works
Why this matters:
- Provides instant proof that attention computes relationships
- Fast feedback loop (30s vs 5min for Q&A)
- Binary success metric (either works or doesn't)
- From the original transformer paper validation tasks
- Perfect for debugging attention implementation
New milestone 05 demo that shows students the model learning to "talk":
- Live dashboard with epoch-by-epoch response progression
- Systems stats panel (tokens/sec, batch time, memory)
- 3 test prompts with full history displayed
- Smaller model (110K params) for ~2 minute training time
🤖 Generated with [Claude Code](https://claude.com/claude-code)
New milestone 05 demo that shows students the model learning to "talk":
- Live dashboard with epoch-by-epoch response progression
- Systems stats panel (tokens/sec, batch time, memory)
- 3 test prompts with full history displayed
- Smaller model (110K params) for ~2 minute training time
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Use rich.live.Live to show real-time progress indicator during epoch training.
This gives visual feedback that code is running during potentially slow operations.
Use rich.live.Live to show real-time progress indicator during epoch training.
This gives visual feedback that code is running during potentially slow operations.
- Remove auto-enable from autograd.py module load (let __init__.py handle it)
- Silence the already enabled warning (just return silently)
- Remove explicit enable_autograd() calls from milestones that do not need them
- Remove auto-enable from autograd.py module load (let __init__.py handle it)
- Silence the already enabled warning (just return silently)
- Remove explicit enable_autograd() calls from milestones that do not need them
The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt]
were appearing as visible text in the PDF because the enumitem package
wasn't loaded. This fix adds \usepackage{enumitem} to the preamble.
All itemized lists now format correctly with proper spacing and margins.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt]
were appearing as visible text in the PDF because the enumitem package
wasn't loaded. This fix adds \usepackage{enumitem} to the preamble.
All itemized lists now format correctly with proper spacing and margins.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Improve site navigation and content structure
- Update development testing documentation
- Enhance site styling and visual consistency
- Update release notes and milestone templates
- Improve site rebuild script functionality
- Improve site navigation and content structure
- Update development testing documentation
- Enhance site styling and visual consistency
- Update release notes and milestone templates
- Improve site rebuild script functionality
Created standardized milestone documentation following the M06 pattern:
- M01 (1957 Perceptron): Forward pass vs trained model progression
- M02 (1969 XOR): Crisis demonstration and multi-layer solution
- M03 (1986 MLP): TinyDigits and MNIST hierarchical learning
- M04 (1998 CNN): Spatial operations on digits and CIFAR-10
- M05 (2017 Transformer): Q&A and dialogue generation with attention
Each README includes:
- Historical context and significance
- Required modules with clear dependencies
- Milestone structure explaining each script's purpose
- Expected results and performance metrics
- Key learning objectives and conceptual insights
- Running instructions with proper commands
- Further reading references
- Achievement unlocked summaries
This establishes single source of truth for milestone documentation
and provides students with comprehensive guides for each checkpoint.
Created standardized milestone documentation following the M06 pattern:
- M01 (1957 Perceptron): Forward pass vs trained model progression
- M02 (1969 XOR): Crisis demonstration and multi-layer solution
- M03 (1986 MLP): TinyDigits and MNIST hierarchical learning
- M04 (1998 CNN): Spatial operations on digits and CIFAR-10
- M05 (2017 Transformer): Q&A and dialogue generation with attention
Each README includes:
- Historical context and significance
- Required modules with clear dependencies
- Milestone structure explaining each script's purpose
- Expected results and performance metrics
- Key learning objectives and conceptual insights
- Running instructions with proper commands
- Further reading references
- Achievement unlocked summaries
This establishes single source of truth for milestone documentation
and provides students with comprehensive guides for each checkpoint.