237 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
43ea5f9a65 Fix MLPerf milestone metrics: FLOPs calculation, quantization compression ratio, pruning delta sign
- Fixed FLOPs calculation to handle models with .layers attribute (not just Sequential)
- Fixed quantization compression ratio to calculate theoretical INT8 size (1 byte per element)
- Fixed pruning accuracy delta sign to correctly show +/- direction
- Added missing export directives for Tensor and numpy imports in acceleration module

Results now correctly show:
- FLOPs: 4,736 (was incorrectly showing 64)
- Quantization: 4.0x compression (was incorrectly showing 1.0x)
- Pruning delta: correct +/- sign based on actual accuracy change
2025-12-03 09:36:10 -08:00
Vijay Janapa Reddi
93e536e90d Add KV Cache and Acceleration to MLPerf milestone
- Add Module 17 (KVCache) demo with transformer
- Add Module 18 (vectorized_matmul) benchmark
- Fix missing imports in acceleration.py
- Update milestone to showcase ALL optimization modules (14-19)
- Show comprehensive optimization journey from profiling to deployment
2025-12-03 09:20:13 -08:00
Vijay Janapa Reddi
8334813e7f Enhance MLPerf milestone with comprehensive profiling and benchmarking
- Add FLOPs counting and throughput to baseline profile
- Use Benchmark class from Module 19 for standardized measurements
- Show detailed latency stats: mean, std, min/max, P95
- Fix missing statistics import in benchmark.py
- Use correct BenchmarkResult attribute names
- Showcase Modules 14, 15, 16, 19 working together
2025-12-03 09:16:07 -08:00
Vijay Janapa Reddi
ee49aeb3c6 Fix MLPerf milestones and improve accuracy display
- Fix import names: ProfilerComplete->Profiler, QuantizationComplete->Quantizer, CompressionComplete->Compressor
- Add missing Embedding import to transformer.py
- Update optimization olympics table to show baseline acc, new acc, and delta with +/- signs
- Milestones 01, 02, 05, 06 all working
2025-12-03 09:10:18 -08:00
Vijay Janapa Reddi
4aeb3c9c69 Merge main into dev, resolving conflicts with dev's version 2025-12-03 07:26:43 -08:00
Vijay Janapa Reddi
42e07151d5 Add subscribe modal popup with MLSysBook integration
- Add subscribe-modal.js with elegant popup form
- Update top bar: fire-themed dark design (56px), orange accent
- Subscribe button triggers modal instead of navigating away
- Modal shows MLSysBook + TinyTorch branding connection
- Form submits to mlsysbook newsletter with tinytorch-website tag
- Orange Subscribe button matches TinyTorch fire theme
- Responsive design with dark mode support
2025-12-03 05:56:38 -08:00
Vijay Janapa Reddi
dde470a4e5 Fix all stale imports from models.transformer to core.transformer 2025-12-03 00:28:37 -08:00
Vijay Janapa Reddi
b457b449d7 Add create_causal_mask to transformer module and fix imports
- Added create_causal_mask() helper function to src/13_transformers
- Updated tinytorch/__init__.py to import from core.transformer
- Deleted stale tinytorch/models/transformer.py (now in core/)
- Updated TinyTalks to use the new import path

The create_causal_mask function is essential for autoregressive
generation - it ensures each position only attends to past tokens.
2025-12-03 00:27:07 -08:00
Vijay Janapa Reddi
a44fff67db TinyTalks demo working with causal masking
Key fixes:
- Added causal mask so model can only attend to past tokens
- This matches training (teacher forcing) with generation (autoregressive)
- Used simpler words with distinct patterns for reliable completion

The .data access issue was a red herring - the real problem was
that without causal masking, the model sees future tokens during
training but not during generation. Causal mask fixes this.
2025-12-03 00:18:51 -08:00
Vijay Janapa Reddi
e97d74b0d6 WIP: TinyTalks with diagnostic tests
Identified critical issue: Tensor indexing/slicing breaks gradient graph.

Root cause:
- Tensor.__getitem__ creates new Tensor without backward connection
- Tensor(x.data...) pattern disconnects from graph
- This is why attention_proof works (reshapes, doesn't slice)

Diagnostic tests reveal:
- Individual components (embedding, attention) pass gradient tests
- Full forward-backward fails when using .data access
- Loss doesn't decrease due to broken gradient chain

TODO: Fix in src/01_tensor:
- Make __getitem__ maintain computation graph
- Add warning when .data is used in grad-breaking context
- Consider adding .detach() method for explicit disconnection
2025-12-03 00:09:39 -08:00
Vijay Janapa Reddi
0c3e1ccfcb WIP: Add TinyTalks generation demo (needs debugging) 2025-12-03 00:04:24 -08:00
Vijay Janapa Reddi
456459ec7e Add KV caching demo and support multi-part milestones
MLPerf Milestone 06 now has two parts:
- 01_optimization_olympics.py: Profiling + Quantization + Pruning on MLP
- 02_generation_speedup.py: KV Caching for 10× faster Transformer

Milestone system changes:
- Support 'scripts' array for multi-part milestones
- Run all parts sequentially with progress tracking
- Show all parts in milestone info and banner
- Success message lists all completed parts

Removed placeholder scripts:
- 01_baseline_profile.py (redundant)
- 02_compression.py (merged into 01)
- 03_generation_opts.py (replaced by 02)
2025-12-03 00:00:40 -08:00
Vijay Janapa Reddi
80f402ea19 Move networks.py to 06_mlperf folder to avoid global duplication
- Networks library is specific to Milestone 06 (optimization focus)
- Milestones 01-05 keep their 'YOUR Module X' inline experience
- Updated header to clarify these are pre-built for optimization
2025-12-02 23:53:12 -08:00
Vijay Janapa Reddi
d02232c6cc Add shared milestone networks library
- Created milestones/networks.py with reusable network definitions
- Perceptron (Milestone 01), DigitMLP (03), SimpleCNN (04), MinimalTransformer (05)
- MLPerf milestone now imports networks from previous milestones
- All networks tested and verified working
- Enables optimization of the same networks students built earlier
2025-12-02 23:50:57 -08:00
Vijay Janapa Reddi
b5a9e5e974 Rewrite MLPerf milestone to use actual TinyTorch APIs
- Uses Profiler class from Module 14
- Uses QuantizationComplete from Module 15
- Uses CompressionComplete from Module 16
- Clearly shows 'YOUR implementation' for each step
- Builds on SimpleMLP from earlier milestones
- Shows how all modules work together
2025-12-02 23:48:17 -08:00
Vijay Janapa Reddi
9eabcbab89 Improve MLPerf milestone and add centralized progress sync
MLPerf changes:
- Show quantization and pruning individually (not combined)
- Added 'Challenge: Combine Both' as future competition
- Clearer output showing each technique's impact

Progress sync:
- Added _offer_progress_sync() to milestone completion
- Uses centralized SubmissionHandler (same as module completion)
- Prompts user to sync achievement after milestone success
- Single endpoint for all progress updates
2025-12-02 23:40:57 -08:00
Vijay Janapa Reddi
7f6dd19c10 Improve milestone 05 (Transformer) with letters for better visualization
- Enhanced attention proof to use A-Z letters instead of numbers
- Shows MCYWUH → HUWYCM instead of [1,2,3] → [3,2,1]
- More intuitive and fun for students
- Removed quickdemo, generation, dialogue scripts (too slow/gibberish)
2025-12-02 23:33:58 -08:00
Vijay Janapa Reddi
c4d0bdb901 Add ASCII box alignment tool and fix 46 simple boxes
- Add tools/dev/fix_ascii_boxes.py for aligning ASCII art boxes
- Fix alignment of right-side vertical bars in simple boxes
- Tool handles simple boxes (2 vertical bars per line)
- Reports complex nested boxes for manual review (118 found)
- Fixed boxes in: src/, milestones/
2025-11-30 08:57:51 -05:00
Vijay Janapa Reddi
5cf0150805 Add BatchNorm and data augmentation to CIFAR-10 milestone
- Enhanced CIFAR-10 CNN with BatchNorm2d for stable training
- Added RandomHorizontalFlip and RandomCrop augmentation transforms
- Improved training accuracy from 65%+ to 70%+ with modern architecture
- Updated demo tapes with opening comments for clarity
- Regenerated welcome GIF, removed outdated demo GIFs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:27:15 -05:00
Vijay Janapa Reddi
c61f7ec7a6 Clean up milestone directories
- Removed 30 debugging and development artifact files
- Kept core system, documentation, and demo files
- tests/milestones: 9 clean files (system + docs)
- milestones/05_2017_transformer: 5 clean files (demos)
- Clear, focused directory structure
- Ready for students and developers
2025-11-22 20:30:58 -05:00
Vijay Janapa Reddi
0d6807cefb Clean up milestone directories
- Removed 30 debugging and development artifact files
- Kept core system, documentation, and demo files
- tests/milestones: 9 clean files (system + docs)
- milestones/05_2017_transformer: 5 clean files (demos)
- Clear, focused directory structure
- Ready for students and developers
2025-11-22 20:30:58 -05:00
Vijay Janapa Reddi
0e135f1aea Implement Tensor slicing with progressive disclosure and fix embedding gradient flow
WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles

MODULE 01 (Tensor):
- Added __getitem__ method for basic slicing operations
- Clean implementation with NO gradient mentions (progressive disclosure)
- Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1]
- Ensures scalar results are wrapped in arrays

MODULE 05 (Autograd):
- Added SliceBackward function for gradient computation
- Implements proper gradient scatter: zeros everywhere except sliced positions
- Added monkey-patching in enable_autograd() for __getitem__
- Follows same pattern as existing operations (add, mul, matmul)

MODULE 11 (Embeddings):
- Updated PositionalEncoding to use Tensor slicing instead of .data
- Fixed multiple .data accesses that broke computation graphs
- Removed Tensor() wrapping that created gradient-disconnected leafs
- Uses proper Tensor operations to preserve gradient flow

TESTING:
- All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training)
- 19/19 parameters get gradients (was 18/19 before)
- Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before)
- Model still not learning (0% accuracy) - needs fresh session to test monkey-patching

WHY THIS MATTERS:
- Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings
- Progressive disclosure maintains educational integrity
- Follows existing TinyTorch architecture patterns
- Enables position embeddings to potentially learn (pending verification)

DOCUMENTS CREATED:
- milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md
- milestones/05_2017_transformer/STATUS.md
- milestones/05_2017_transformer/FIXES_SUMMARY.md
- milestones/05_2017_transformer/DEBUG_REVERSAL.md
- tests/milestones/test_reversal_debug.py (component tests)

ARCHITECTURAL PRINCIPLE:
Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems.
Don't expose Module 05 concepts (gradients) in Module 01 (basic operations).
Monkey-patch when features are needed, not before.
2025-11-22 18:26:12 -05:00
Vijay Janapa Reddi
763cdd2bf2 Implement Tensor slicing with progressive disclosure and fix embedding gradient flow
WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles

MODULE 01 (Tensor):
- Added __getitem__ method for basic slicing operations
- Clean implementation with NO gradient mentions (progressive disclosure)
- Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1]
- Ensures scalar results are wrapped in arrays

MODULE 05 (Autograd):
- Added SliceBackward function for gradient computation
- Implements proper gradient scatter: zeros everywhere except sliced positions
- Added monkey-patching in enable_autograd() for __getitem__
- Follows same pattern as existing operations (add, mul, matmul)

MODULE 11 (Embeddings):
- Updated PositionalEncoding to use Tensor slicing instead of .data
- Fixed multiple .data accesses that broke computation graphs
- Removed Tensor() wrapping that created gradient-disconnected leafs
- Uses proper Tensor operations to preserve gradient flow

TESTING:
- All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training)
- 19/19 parameters get gradients (was 18/19 before)
- Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before)
- Model still not learning (0% accuracy) - needs fresh session to test monkey-patching

WHY THIS MATTERS:
- Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings
- Progressive disclosure maintains educational integrity
- Follows existing TinyTorch architecture patterns
- Enables position embeddings to potentially learn (pending verification)

DOCUMENTS CREATED:
- milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md
- milestones/05_2017_transformer/STATUS.md
- milestones/05_2017_transformer/FIXES_SUMMARY.md
- milestones/05_2017_transformer/DEBUG_REVERSAL.md
- tests/milestones/test_reversal_debug.py (component tests)

ARCHITECTURAL PRINCIPLE:
Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems.
Don't expose Module 05 concepts (gradients) in Module 01 (basic operations).
Monkey-patch when features are needed, not before.
2025-11-22 18:26:12 -05:00
Vijay Janapa Reddi
34c9b7aec3 Add sequence reversal as first Transformer milestone (00_vaswani_attention_proof.py)
- The canonical attention test from 'Attention is All You Need' paper
- Proves attention mechanism works by reversing sequences
- Impossible without cross-position attention (no shortcuts!)
- Trains in 30 seconds with 95%+ accuracy target
- Includes full educational context and ASCII architecture diagram
- Student-friendly with rich console output and progress tracking
- Should be run BEFORE complex Q&A tasks to verify attention works

Why this matters:
- Provides instant proof that attention computes relationships
- Fast feedback loop (30s vs 5min for Q&A)
- Binary success metric (either works or doesn't)
- From the original transformer paper validation tasks
- Perfect for debugging attention implementation
2025-11-22 18:05:08 -05:00
Vijay Janapa Reddi
ff9f7d682a Add sequence reversal as first Transformer milestone (00_vaswani_attention_proof.py)
- The canonical attention test from 'Attention is All You Need' paper
- Proves attention mechanism works by reversing sequences
- Impossible without cross-position attention (no shortcuts!)
- Trains in 30 seconds with 95%+ accuracy target
- Includes full educational context and ASCII architecture diagram
- Student-friendly with rich console output and progress tracking
- Should be run BEFORE complex Q&A tasks to verify attention works

Why this matters:
- Provides instant proof that attention computes relationships
- Fast feedback loop (30s vs 5min for Q&A)
- Binary success metric (either works or doesn't)
- From the original transformer paper validation tasks
- Perfect for debugging attention implementation
2025-11-22 18:05:08 -05:00
Vijay Janapa Reddi
59ebf0d385 Add transformer quickdemo with live learning progression dashboard
New milestone 05 demo that shows students the model learning to "talk":
- Live dashboard with epoch-by-epoch response progression
- Systems stats panel (tokens/sec, batch time, memory)
- 3 test prompts with full history displayed
- Smaller model (110K params) for ~2 minute training time

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-11-22 15:55:12 -05:00
Vijay Janapa Reddi
308d6f2049 Add transformer quickdemo with live learning progression dashboard
New milestone 05 demo that shows students the model learning to "talk":
- Live dashboard with epoch-by-epoch response progression
- Systems stats panel (tokens/sec, batch time, memory)
- 3 test prompts with full history displayed
- Smaller model (110K params) for ~2 minute training time

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-11-22 15:55:12 -05:00
Vijay Janapa Reddi
5c3695a797 Add live spinner to milestone training loops
Use rich.live.Live to show real-time progress indicator during epoch training.
This gives visual feedback that code is running during potentially slow operations.
2025-11-22 15:31:48 -05:00
Vijay Janapa Reddi
5e1dde6f70 Add live spinner to milestone training loops
Use rich.live.Live to show real-time progress indicator during epoch training.
This gives visual feedback that code is running during potentially slow operations.
2025-11-22 15:31:48 -05:00
Vijay Janapa Reddi
c77b05797f Fix duplicate autograd enabled messages
- Remove auto-enable from autograd.py module load (let __init__.py handle it)
- Silence the already enabled warning (just return silently)
- Remove explicit enable_autograd() calls from milestones that do not need them
2025-11-22 15:31:39 -05:00
Vijay Janapa Reddi
d2486c5565 Fix duplicate autograd enabled messages
- Remove auto-enable from autograd.py module load (let __init__.py handle it)
- Silence the already enabled warning (just return silently)
- Remove explicit enable_autograd() calls from milestones that do not need them
2025-11-22 15:31:39 -05:00
Vijay Janapa Reddi
7bc4ee9244 Add enumitem package to fix itemize formatting
The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt]
were appearing as visible text in the PDF because the enumitem package
wasn't loaded. This fix adds \usepackage{enumitem} to the preamble.

All itemized lists now format correctly with proper spacing and margins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 08:43:41 -05:00
Vijay Janapa Reddi
f31865560e Add enumitem package to fix itemize formatting
The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt]
were appearing as visible text in the PDF because the enumitem package
wasn't loaded. This fix adds \usepackage{enumitem} to the preamble.

All itemized lists now format correctly with proper spacing and margins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 08:43:41 -05:00
Vijay Janapa Reddi
d243c1b47f Apply all remaining critical fixes: tinygrad citation, NBGrader format, hedging, consistency 2025-11-18 09:39:19 -05:00
Vijay Janapa Reddi
d4dcf4f046 Apply all remaining critical fixes: tinygrad citation, NBGrader format, hedging, consistency 2025-11-18 09:39:19 -05:00
Vijay Janapa Reddi
56d8feb2ba Improve SIGCSE paper with reviewer feedback and clean up repository
Paper improvements:
- Add differentiated time estimates (60-80h experienced, 100-120h typical, 140-180h struggling)
- Moderate cognitive load claims with hedging language and empirical validation notes
- Add ML Systems Research subsection with citations (Baydin AD survey, Chen gradient checkpointing, TVM, FlashAttention)
- Add comprehensive Threats to Validity section (selection bias, single institution, demand characteristics, no control group, maturation, assessment validity)
- Define jargon (monkey-patching) at first use with clear explanation

Documentation updates:
- Restructure TITO CLI docs into dedicated section (overview, modules, milestones, data, troubleshooting)
- Update student workflow guide and quickstart guide
- Remove deprecated files (testing-framework.md, tito-essentials.md)
- Update module template and testing architecture docs

Repository cleanup:
- Remove temporary review files (ADDITIONAL_REVIEWS.md, EDTECH_OPENSOURCE_REVIEWS.md, TA_STRUGGLING_STUDENT_REVIEWS.md, etc.)
- Remove temporary development planning docs
- Update demo GIFs and configurations
2025-11-16 23:46:38 -05:00
Vijay Janapa Reddi
a13b4f7244 Improve SIGCSE paper with reviewer feedback and clean up repository
Paper improvements:
- Add differentiated time estimates (60-80h experienced, 100-120h typical, 140-180h struggling)
- Moderate cognitive load claims with hedging language and empirical validation notes
- Add ML Systems Research subsection with citations (Baydin AD survey, Chen gradient checkpointing, TVM, FlashAttention)
- Add comprehensive Threats to Validity section (selection bias, single institution, demand characteristics, no control group, maturation, assessment validity)
- Define jargon (monkey-patching) at first use with clear explanation

Documentation updates:
- Restructure TITO CLI docs into dedicated section (overview, modules, milestones, data, troubleshooting)
- Update student workflow guide and quickstart guide
- Remove deprecated files (testing-framework.md, tito-essentials.md)
- Update module template and testing architecture docs

Repository cleanup:
- Remove temporary review files (ADDITIONAL_REVIEWS.md, EDTECH_OPENSOURCE_REVIEWS.md, TA_STRUGGLING_STUDENT_REVIEWS.md, etc.)
- Remove temporary development planning docs
- Update demo GIFs and configurations
2025-11-16 23:46:38 -05:00
Vijay Janapa Reddi
1c5a269f80 Update development documentation and workflow files
- Update GitHub workflow for publishing
- Update December 2024 release notes
- Update module about template and testing documentation
- Update milestone template
2025-11-14 08:28:24 -05:00
Vijay Janapa Reddi
b56af30ba7 Update development documentation and workflow files
- Update GitHub workflow for publishing
- Update December 2024 release notes
- Update module about template and testing documentation
- Update milestone template
2025-11-14 08:28:24 -05:00
Vijay Janapa Reddi
0d2560c490 Update site documentation and development guides
- Improve site navigation and content structure
- Update development testing documentation
- Enhance site styling and visual consistency
- Update release notes and milestone templates
- Improve site rebuild script functionality
2025-11-13 10:42:51 -05:00
Vijay Janapa Reddi
b9f142b2d8 Update site documentation and development guides
- Improve site navigation and content structure
- Update development testing documentation
- Enhance site styling and visual consistency
- Update release notes and milestone templates
- Improve site rebuild script functionality
2025-11-13 10:42:51 -05:00
Vijay Janapa Reddi
4c0a046953 Remove emoji prefixes from markdown headers in milestones and site chapters 2025-11-11 21:17:22 -05:00
Vijay Janapa Reddi
2a496b28fe Remove emoji prefixes from markdown headers in milestones and site chapters 2025-11-11 21:17:22 -05:00
Vijay Janapa Reddi
11f1771f17 Fix remaining critical issues in milestone READMEs
Addressed 3 critical issues identified by education reviewer:

1. Standardized Module 07 terminology:
   - M03: Changed 'training loop' to 'end-to-end training loop'
   - Now consistent across all milestones (M01/M02/M03/M04)

2. Added quantitative loss criteria to M03:
   - TinyDigits: Loss < 0.5 (gives students measurable target)
   - MNIST: Loss < 0.2 (realistic threshold for convergence)
   - Fixed parameter count: ~2K → ~2.4K (accurate calculation)

3. Clarified M06 foundational dependencies:
   - Added note explaining Modules 01-13 are prerequisites
   - Makes clear the table shows ADDITIONAL optimization modules
   - Prevents confusion about complete dependency chain

These fixes bring milestone READMEs to production-ready quality.
Education reviewer grade: A- → A (after these fixes).
2025-11-11 13:12:23 -05:00
Vijay Janapa Reddi
775a40b08c Fix remaining critical issues in milestone READMEs
Addressed 3 critical issues identified by education reviewer:

1. Standardized Module 07 terminology:
   - M03: Changed 'training loop' to 'end-to-end training loop'
   - Now consistent across all milestones (M01/M02/M03/M04)

2. Added quantitative loss criteria to M03:
   - TinyDigits: Loss < 0.5 (gives students measurable target)
   - MNIST: Loss < 0.2 (realistic threshold for convergence)
   - Fixed parameter count: ~2K → ~2.4K (accurate calculation)

3. Clarified M06 foundational dependencies:
   - Added note explaining Modules 01-13 are prerequisites
   - Makes clear the table shows ADDITIONAL optimization modules
   - Prevents confusion about complete dependency chain

These fixes bring milestone READMEs to production-ready quality.
Education reviewer grade: A- → A (after these fixes).
2025-11-11 13:12:23 -05:00
Vijay Janapa Reddi
4653b5f808 Improve milestone READMEs based on education review feedback
Applied Priority 1 critical fixes from education reviewer:

1. Fixed historical accuracy:
   - M01: Clarified Perceptron demonstrated 1957, published 1958

2. Improved module dependency clarity:
   - M01: Split requirements into Part 1 (Module 04) vs Part 2 (Module 07)
   - M02/M04: Added 'end-to-end' clarification for Module 07 (Training)
   - M04: Added missing Module 07 to dependency table

3. Added quantitative success metrics:
   - M02: Added loss values (~0.69 stuck vs → 0.0)
   - M04: Added training time estimates (5-7 min, 30-60 min)
   - M05: Replaced subjective 'coherent' with 'Loss < 1.5, sensible word choices'

These changes address education reviewer's critical feedback about
technical accuracy and measurable learning outcomes. Students now have
clearer prerequisites and quantitative success criteria.
2025-11-11 12:56:39 -05:00
Vijay Janapa Reddi
5d2f6a5221 Improve milestone READMEs based on education review feedback
Applied Priority 1 critical fixes from education reviewer:

1. Fixed historical accuracy:
   - M01: Clarified Perceptron demonstrated 1957, published 1958

2. Improved module dependency clarity:
   - M01: Split requirements into Part 1 (Module 04) vs Part 2 (Module 07)
   - M02/M04: Added 'end-to-end' clarification for Module 07 (Training)
   - M04: Added missing Module 07 to dependency table

3. Added quantitative success metrics:
   - M02: Added loss values (~0.69 stuck vs → 0.0)
   - M04: Added training time estimates (5-7 min, 30-60 min)
   - M05: Replaced subjective 'coherent' with 'Loss < 1.5, sensible word choices'

These changes address education reviewer's critical feedback about
technical accuracy and measurable learning outcomes. Students now have
clearer prerequisites and quantitative success criteria.
2025-11-11 12:56:39 -05:00
Vijay Janapa Reddi
70f03f97ff Add comprehensive README files for milestones 01-05
Created standardized milestone documentation following the M06 pattern:

- M01 (1957 Perceptron): Forward pass vs trained model progression
- M02 (1969 XOR): Crisis demonstration and multi-layer solution
- M03 (1986 MLP): TinyDigits and MNIST hierarchical learning
- M04 (1998 CNN): Spatial operations on digits and CIFAR-10
- M05 (2017 Transformer): Q&A and dialogue generation with attention

Each README includes:
- Historical context and significance
- Required modules with clear dependencies
- Milestone structure explaining each script's purpose
- Expected results and performance metrics
- Key learning objectives and conceptual insights
- Running instructions with proper commands
- Further reading references
- Achievement unlocked summaries

This establishes single source of truth for milestone documentation
and provides students with comprehensive guides for each checkpoint.
2025-11-11 12:49:57 -05:00
Vijay Janapa Reddi
8191b8ebfc Add comprehensive README files for milestones 01-05
Created standardized milestone documentation following the M06 pattern:

- M01 (1957 Perceptron): Forward pass vs trained model progression
- M02 (1969 XOR): Crisis demonstration and multi-layer solution
- M03 (1986 MLP): TinyDigits and MNIST hierarchical learning
- M04 (1998 CNN): Spatial operations on digits and CIFAR-10
- M05 (2017 Transformer): Q&A and dialogue generation with attention

Each README includes:
- Historical context and significance
- Required modules with clear dependencies
- Milestone structure explaining each script's purpose
- Expected results and performance metrics
- Key learning objectives and conceptual insights
- Running instructions with proper commands
- Further reading references
- Achievement unlocked summaries

This establishes single source of truth for milestone documentation
and provides students with comprehensive guides for each checkpoint.
2025-11-11 12:49:57 -05:00
Vijay Janapa Reddi
c80b064a52 Create Milestone 06: MLPerf Optimization Era (2018)
Reorganized optimization content into dedicated M06 milestone:

Structure:
- 01_baseline_profile.py: Profile transformer & establish metrics
  (moved from M05/03_vaswani_profile.py)
- 02_compression.py: Quantization + pruning pipeline (placeholder)
- 03_generation_opts.py: KV-cache + batching opts (placeholder)
- README.md: Complete milestone documentation

Historical Context:
MLPerf (2018) represents the shift from "can we build it?" to
"can we deploy it efficiently?" - systematic optimization as a
discipline rather than ad-hoc performance hacks.

Educational Flow:
- M05 now focuses on building transformers (2 scripts)
- M06 teaches production optimization (3 scripts)
- Clear separation: model creation vs. model optimization

Pedagogical Benefits:
1. Iterative optimization workflow (measure → optimize → validate)
2. Realistic production constraints (size, speed, accuracy)
3. Composition of techniques (quantization + pruning + caching)

Placeholders await implementation of modules 15-18.

Updated:
- README.md: M05 reduced to 2 scripts, M06 described
- M05 now ends after generation/dialogue
- M06 begins systematic optimization journey
2025-11-11 12:32:27 -05:00