TinyTorch

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2025-12-05 19:17:52 -06:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	43ea5f9a65	Fix MLPerf milestone metrics: FLOPs calculation, quantization compression ratio, pruning delta sign - Fixed FLOPs calculation to handle models with .layers attribute (not just Sequential) - Fixed quantization compression ratio to calculate theoretical INT8 size (1 byte per element) - Fixed pruning accuracy delta sign to correctly show +/- direction - Added missing export directives for Tensor and numpy imports in acceleration module Results now correctly show: - FLOPs: 4,736 (was incorrectly showing 64) - Quantization: 4.0x compression (was incorrectly showing 1.0x) - Pruning delta: correct +/- sign based on actual accuracy change	2025-12-03 09:36:10 -08:00
Vijay Janapa Reddi	93e536e90d	Add KV Cache and Acceleration to MLPerf milestone - Add Module 17 (KVCache) demo with transformer - Add Module 18 (vectorized_matmul) benchmark - Fix missing imports in acceleration.py - Update milestone to showcase ALL optimization modules (14-19) - Show comprehensive optimization journey from profiling to deployment	2025-12-03 09:20:13 -08:00
Vijay Janapa Reddi	8334813e7f	Enhance MLPerf milestone with comprehensive profiling and benchmarking - Add FLOPs counting and throughput to baseline profile - Use Benchmark class from Module 19 for standardized measurements - Show detailed latency stats: mean, std, min/max, P95 - Fix missing statistics import in benchmark.py - Use correct BenchmarkResult attribute names - Showcase Modules 14, 15, 16, 19 working together	2025-12-03 09:16:07 -08:00
Vijay Janapa Reddi	ee49aeb3c6	Fix MLPerf milestones and improve accuracy display - Fix import names: ProfilerComplete->Profiler, QuantizationComplete->Quantizer, CompressionComplete->Compressor - Add missing Embedding import to transformer.py - Update optimization olympics table to show baseline acc, new acc, and delta with +/- signs - Milestones 01, 02, 05, 06 all working	2025-12-03 09:10:18 -08:00
Vijay Janapa Reddi	4aeb3c9c69	Merge main into dev, resolving conflicts with dev's version	2025-12-03 07:26:43 -08:00
Vijay Janapa Reddi	42e07151d5	Add subscribe modal popup with MLSysBook integration - Add subscribe-modal.js with elegant popup form - Update top bar: fire-themed dark design (56px), orange accent - Subscribe button triggers modal instead of navigating away - Modal shows MLSysBook + TinyTorch branding connection - Form submits to mlsysbook newsletter with tinytorch-website tag - Orange Subscribe button matches TinyTorch fire theme - Responsive design with dark mode support	2025-12-03 05:56:38 -08:00
Vijay Janapa Reddi	dde470a4e5	Fix all stale imports from models.transformer to core.transformer	2025-12-03 00:28:37 -08:00
Vijay Janapa Reddi	b457b449d7	Add create_causal_mask to transformer module and fix imports - Added create_causal_mask() helper function to src/13_transformers - Updated tinytorch/__init__.py to import from core.transformer - Deleted stale tinytorch/models/transformer.py (now in core/) - Updated TinyTalks to use the new import path The create_causal_mask function is essential for autoregressive generation - it ensures each position only attends to past tokens.	2025-12-03 00:27:07 -08:00
Vijay Janapa Reddi	a44fff67db	TinyTalks demo working with causal masking Key fixes: - Added causal mask so model can only attend to past tokens - This matches training (teacher forcing) with generation (autoregressive) - Used simpler words with distinct patterns for reliable completion The .data access issue was a red herring - the real problem was that without causal masking, the model sees future tokens during training but not during generation. Causal mask fixes this.	2025-12-03 00:18:51 -08:00
Vijay Janapa Reddi	e97d74b0d6	WIP: TinyTalks with diagnostic tests Identified critical issue: Tensor indexing/slicing breaks gradient graph. Root cause: - Tensor.__getitem__ creates new Tensor without backward connection - Tensor(x.data...) pattern disconnects from graph - This is why attention_proof works (reshapes, doesn't slice) Diagnostic tests reveal: - Individual components (embedding, attention) pass gradient tests - Full forward-backward fails when using .data access - Loss doesn't decrease due to broken gradient chain TODO: Fix in src/01_tensor: - Make __getitem__ maintain computation graph - Add warning when .data is used in grad-breaking context - Consider adding .detach() method for explicit disconnection	2025-12-03 00:09:39 -08:00
Vijay Janapa Reddi	0c3e1ccfcb	WIP: Add TinyTalks generation demo (needs debugging)	2025-12-03 00:04:24 -08:00
Vijay Janapa Reddi	456459ec7e	Add KV caching demo and support multi-part milestones MLPerf Milestone 06 now has two parts: - 01_optimization_olympics.py: Profiling + Quantization + Pruning on MLP - 02_generation_speedup.py: KV Caching for 10× faster Transformer Milestone system changes: - Support 'scripts' array for multi-part milestones - Run all parts sequentially with progress tracking - Show all parts in milestone info and banner - Success message lists all completed parts Removed placeholder scripts: - 01_baseline_profile.py (redundant) - 02_compression.py (merged into 01) - 03_generation_opts.py (replaced by 02)	2025-12-03 00:00:40 -08:00
Vijay Janapa Reddi	80f402ea19	Move networks.py to 06_mlperf folder to avoid global duplication - Networks library is specific to Milestone 06 (optimization focus) - Milestones 01-05 keep their 'YOUR Module X' inline experience - Updated header to clarify these are pre-built for optimization	2025-12-02 23:53:12 -08:00
Vijay Janapa Reddi	d02232c6cc	Add shared milestone networks library - Created milestones/networks.py with reusable network definitions - Perceptron (Milestone 01), DigitMLP (03), SimpleCNN (04), MinimalTransformer (05) - MLPerf milestone now imports networks from previous milestones - All networks tested and verified working - Enables optimization of the same networks students built earlier	2025-12-02 23:50:57 -08:00
Vijay Janapa Reddi	b5a9e5e974	Rewrite MLPerf milestone to use actual TinyTorch APIs - Uses Profiler class from Module 14 - Uses QuantizationComplete from Module 15 - Uses CompressionComplete from Module 16 - Clearly shows 'YOUR implementation' for each step - Builds on SimpleMLP from earlier milestones - Shows how all modules work together	2025-12-02 23:48:17 -08:00
Vijay Janapa Reddi	9eabcbab89	Improve MLPerf milestone and add centralized progress sync MLPerf changes: - Show quantization and pruning individually (not combined) - Added 'Challenge: Combine Both' as future competition - Clearer output showing each technique's impact Progress sync: - Added _offer_progress_sync() to milestone completion - Uses centralized SubmissionHandler (same as module completion) - Prompts user to sync achievement after milestone success - Single endpoint for all progress updates	2025-12-02 23:40:57 -08:00
Vijay Janapa Reddi	7f6dd19c10	Improve milestone 05 (Transformer) with letters for better visualization - Enhanced attention proof to use A-Z letters instead of numbers - Shows MCYWUH → HUWYCM instead of [1,2,3] → [3,2,1] - More intuitive and fun for students - Removed quickdemo, generation, dialogue scripts (too slow/gibberish)	2025-12-02 23:33:58 -08:00
Vijay Janapa Reddi	c4d0bdb901	Add ASCII box alignment tool and fix 46 simple boxes - Add tools/dev/fix_ascii_boxes.py for aligning ASCII art boxes - Fix alignment of right-side vertical bars in simple boxes - Tool handles simple boxes (2 vertical bars per line) - Reports complex nested boxes for manual review (118 found) - Fixed boxes in: src/, milestones/	2025-11-30 08:57:51 -05:00
Vijay Janapa Reddi	5cf0150805	Add BatchNorm and data augmentation to CIFAR-10 milestone - Enhanced CIFAR-10 CNN with BatchNorm2d for stable training - Added RandomHorizontalFlip and RandomCrop augmentation transforms - Improved training accuracy from 65%+ to 70%+ with modern architecture - Updated demo tapes with opening comments for clarity - Regenerated welcome GIF, removed outdated demo GIFs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 12:27:15 -05:00
Vijay Janapa Reddi	c61f7ec7a6	Clean up milestone directories - Removed 30 debugging and development artifact files - Kept core system, documentation, and demo files - tests/milestones: 9 clean files (system + docs) - milestones/05_2017_transformer: 5 clean files (demos) - Clear, focused directory structure - Ready for students and developers	2025-11-22 20:30:58 -05:00
Vijay Janapa Reddi	0d6807cefb	Clean up milestone directories - Removed 30 debugging and development artifact files - Kept core system, documentation, and demo files - tests/milestones: 9 clean files (system + docs) - milestones/05_2017_transformer: 5 clean files (demos) - Clear, focused directory structure - Ready for students and developers	2025-11-22 20:30:58 -05:00
Vijay Janapa Reddi	0e135f1aea	Implement Tensor slicing with progressive disclosure and fix embedding gradient flow WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles MODULE 01 (Tensor): - Added __getitem__ method for basic slicing operations - Clean implementation with NO gradient mentions (progressive disclosure) - Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1] - Ensures scalar results are wrapped in arrays MODULE 05 (Autograd): - Added SliceBackward function for gradient computation - Implements proper gradient scatter: zeros everywhere except sliced positions - Added monkey-patching in enable_autograd() for __getitem__ - Follows same pattern as existing operations (add, mul, matmul) MODULE 11 (Embeddings): - Updated PositionalEncoding to use Tensor slicing instead of .data - Fixed multiple .data accesses that broke computation graphs - Removed Tensor() wrapping that created gradient-disconnected leafs - Uses proper Tensor operations to preserve gradient flow TESTING: - All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training) - 19/19 parameters get gradients (was 18/19 before) - Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before) - Model still not learning (0% accuracy) - needs fresh session to test monkey-patching WHY THIS MATTERS: - Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings - Progressive disclosure maintains educational integrity - Follows existing TinyTorch architecture patterns - Enables position embeddings to potentially learn (pending verification) DOCUMENTS CREATED: - milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md - milestones/05_2017_transformer/STATUS.md - milestones/05_2017_transformer/FIXES_SUMMARY.md - milestones/05_2017_transformer/DEBUG_REVERSAL.md - tests/milestones/test_reversal_debug.py (component tests) ARCHITECTURAL PRINCIPLE: Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems. Don't expose Module 05 concepts (gradients) in Module 01 (basic operations). Monkey-patch when features are needed, not before.	2025-11-22 18:26:12 -05:00
Vijay Janapa Reddi	763cdd2bf2	Implement Tensor slicing with progressive disclosure and fix embedding gradient flow WHAT: Added Tensor.__getitem__ (slicing) following progressive disclosure principles MODULE 01 (Tensor): - Added __getitem__ method for basic slicing operations - Clean implementation with NO gradient mentions (progressive disclosure) - Supports all NumPy-style indexing: x[0], x[:3], x[1:4], x[:, 1] - Ensures scalar results are wrapped in arrays MODULE 05 (Autograd): - Added SliceBackward function for gradient computation - Implements proper gradient scatter: zeros everywhere except sliced positions - Added monkey-patching in enable_autograd() for __getitem__ - Follows same pattern as existing operations (add, mul, matmul) MODULE 11 (Embeddings): - Updated PositionalEncoding to use Tensor slicing instead of .data - Fixed multiple .data accesses that broke computation graphs - Removed Tensor() wrapping that created gradient-disconnected leafs - Uses proper Tensor operations to preserve gradient flow TESTING: - All 6 component tests PASS (Embedding, Attention, FFN, Residual, Forward, Training) - 19/19 parameters get gradients (was 18/19 before) - Loss dropping better: 1.54→1.08 (vs 1.62→1.24 before) - Model still not learning (0% accuracy) - needs fresh session to test monkey-patching WHY THIS MATTERS: - Tensor slicing is FUNDAMENTAL - needed by transformers for position embeddings - Progressive disclosure maintains educational integrity - Follows existing TinyTorch architecture patterns - Enables position embeddings to potentially learn (pending verification) DOCUMENTS CREATED: - milestones/05_2017_transformer/TENSOR_SLICING_IMPLEMENTATION.md - milestones/05_2017_transformer/STATUS.md - milestones/05_2017_transformer/FIXES_SUMMARY.md - milestones/05_2017_transformer/DEBUG_REVERSAL.md - tests/milestones/test_reversal_debug.py (component tests) ARCHITECTURAL PRINCIPLE: Progressive disclosure is not just nice-to-have, it's CRITICAL for educational systems. Don't expose Module 05 concepts (gradients) in Module 01 (basic operations). Monkey-patch when features are needed, not before.	2025-11-22 18:26:12 -05:00
Vijay Janapa Reddi	34c9b7aec3	Add sequence reversal as first Transformer milestone (00_vaswani_attention_proof.py) - The canonical attention test from 'Attention is All You Need' paper - Proves attention mechanism works by reversing sequences - Impossible without cross-position attention (no shortcuts!) - Trains in 30 seconds with 95%+ accuracy target - Includes full educational context and ASCII architecture diagram - Student-friendly with rich console output and progress tracking - Should be run BEFORE complex Q&A tasks to verify attention works Why this matters: - Provides instant proof that attention computes relationships - Fast feedback loop (30s vs 5min for Q&A) - Binary success metric (either works or doesn't) - From the original transformer paper validation tasks - Perfect for debugging attention implementation	2025-11-22 18:05:08 -05:00
Vijay Janapa Reddi	ff9f7d682a	Add sequence reversal as first Transformer milestone (00_vaswani_attention_proof.py) - The canonical attention test from 'Attention is All You Need' paper - Proves attention mechanism works by reversing sequences - Impossible without cross-position attention (no shortcuts!) - Trains in 30 seconds with 95%+ accuracy target - Includes full educational context and ASCII architecture diagram - Student-friendly with rich console output and progress tracking - Should be run BEFORE complex Q&A tasks to verify attention works Why this matters: - Provides instant proof that attention computes relationships - Fast feedback loop (30s vs 5min for Q&A) - Binary success metric (either works or doesn't) - From the original transformer paper validation tasks - Perfect for debugging attention implementation	2025-11-22 18:05:08 -05:00
Vijay Janapa Reddi	59ebf0d385	Add transformer quickdemo with live learning progression dashboard New milestone 05 demo that shows students the model learning to "talk": - Live dashboard with epoch-by-epoch response progression - Systems stats panel (tokens/sec, batch time, memory) - 3 test prompts with full history displayed - Smaller model (110K params) for ~2 minute training time 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2025-11-22 15:55:12 -05:00
Vijay Janapa Reddi	308d6f2049	Add transformer quickdemo with live learning progression dashboard New milestone 05 demo that shows students the model learning to "talk": - Live dashboard with epoch-by-epoch response progression - Systems stats panel (tokens/sec, batch time, memory) - 3 test prompts with full history displayed - Smaller model (110K params) for ~2 minute training time 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2025-11-22 15:55:12 -05:00
Vijay Janapa Reddi	5c3695a797	Add live spinner to milestone training loops Use rich.live.Live to show real-time progress indicator during epoch training. This gives visual feedback that code is running during potentially slow operations.	2025-11-22 15:31:48 -05:00
Vijay Janapa Reddi	5e1dde6f70	Add live spinner to milestone training loops Use rich.live.Live to show real-time progress indicator during epoch training. This gives visual feedback that code is running during potentially slow operations.	2025-11-22 15:31:48 -05:00
Vijay Janapa Reddi	c77b05797f	Fix duplicate autograd enabled messages - Remove auto-enable from autograd.py module load (let __init__.py handle it) - Silence the already enabled warning (just return silently) - Remove explicit enable_autograd() calls from milestones that do not need them	2025-11-22 15:31:39 -05:00
Vijay Janapa Reddi	d2486c5565	Fix duplicate autograd enabled messages - Remove auto-enable from autograd.py module load (let __init__.py handle it) - Silence the already enabled warning (just return silently) - Remove explicit enable_autograd() calls from milestones that do not need them	2025-11-22 15:31:39 -05:00
Vijay Janapa Reddi	7bc4ee9244	Add enumitem package to fix itemize formatting The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt] were appearing as visible text in the PDF because the enumitem package wasn't loaded. This fix adds \usepackage{enumitem} to the preamble. All itemized lists now format correctly with proper spacing and margins. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 08:43:41 -05:00
Vijay Janapa Reddi	f31865560e	Add enumitem package to fix itemize formatting The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt] were appearing as visible text in the PDF because the enumitem package wasn't loaded. This fix adds \usepackage{enumitem} to the preamble. All itemized lists now format correctly with proper spacing and margins. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 08:43:41 -05:00
Vijay Janapa Reddi	d243c1b47f	Apply all remaining critical fixes: tinygrad citation, NBGrader format, hedging, consistency	2025-11-18 09:39:19 -05:00
Vijay Janapa Reddi	d4dcf4f046	Apply all remaining critical fixes: tinygrad citation, NBGrader format, hedging, consistency	2025-11-18 09:39:19 -05:00
Vijay Janapa Reddi	56d8feb2ba	Improve SIGCSE paper with reviewer feedback and clean up repository Paper improvements: - Add differentiated time estimates (60-80h experienced, 100-120h typical, 140-180h struggling) - Moderate cognitive load claims with hedging language and empirical validation notes - Add ML Systems Research subsection with citations (Baydin AD survey, Chen gradient checkpointing, TVM, FlashAttention) - Add comprehensive Threats to Validity section (selection bias, single institution, demand characteristics, no control group, maturation, assessment validity) - Define jargon (monkey-patching) at first use with clear explanation Documentation updates: - Restructure TITO CLI docs into dedicated section (overview, modules, milestones, data, troubleshooting) - Update student workflow guide and quickstart guide - Remove deprecated files (testing-framework.md, tito-essentials.md) - Update module template and testing architecture docs Repository cleanup: - Remove temporary review files (ADDITIONAL_REVIEWS.md, EDTECH_OPENSOURCE_REVIEWS.md, TA_STRUGGLING_STUDENT_REVIEWS.md, etc.) - Remove temporary development planning docs - Update demo GIFs and configurations	2025-11-16 23:46:38 -05:00
Vijay Janapa Reddi	a13b4f7244	Improve SIGCSE paper with reviewer feedback and clean up repository Paper improvements: - Add differentiated time estimates (60-80h experienced, 100-120h typical, 140-180h struggling) - Moderate cognitive load claims with hedging language and empirical validation notes - Add ML Systems Research subsection with citations (Baydin AD survey, Chen gradient checkpointing, TVM, FlashAttention) - Add comprehensive Threats to Validity section (selection bias, single institution, demand characteristics, no control group, maturation, assessment validity) - Define jargon (monkey-patching) at first use with clear explanation Documentation updates: - Restructure TITO CLI docs into dedicated section (overview, modules, milestones, data, troubleshooting) - Update student workflow guide and quickstart guide - Remove deprecated files (testing-framework.md, tito-essentials.md) - Update module template and testing architecture docs Repository cleanup: - Remove temporary review files (ADDITIONAL_REVIEWS.md, EDTECH_OPENSOURCE_REVIEWS.md, TA_STRUGGLING_STUDENT_REVIEWS.md, etc.) - Remove temporary development planning docs - Update demo GIFs and configurations	2025-11-16 23:46:38 -05:00
Vijay Janapa Reddi	1c5a269f80	Update development documentation and workflow files - Update GitHub workflow for publishing - Update December 2024 release notes - Update module about template and testing documentation - Update milestone template	2025-11-14 08:28:24 -05:00
Vijay Janapa Reddi	b56af30ba7	Update development documentation and workflow files - Update GitHub workflow for publishing - Update December 2024 release notes - Update module about template and testing documentation - Update milestone template	2025-11-14 08:28:24 -05:00
Vijay Janapa Reddi	0d2560c490	Update site documentation and development guides - Improve site navigation and content structure - Update development testing documentation - Enhance site styling and visual consistency - Update release notes and milestone templates - Improve site rebuild script functionality	2025-11-13 10:42:51 -05:00
Vijay Janapa Reddi	b9f142b2d8	Update site documentation and development guides - Improve site navigation and content structure - Update development testing documentation - Enhance site styling and visual consistency - Update release notes and milestone templates - Improve site rebuild script functionality	2025-11-13 10:42:51 -05:00
Vijay Janapa Reddi	4c0a046953	Remove emoji prefixes from markdown headers in milestones and site chapters	2025-11-11 21:17:22 -05:00
Vijay Janapa Reddi	2a496b28fe	Remove emoji prefixes from markdown headers in milestones and site chapters	2025-11-11 21:17:22 -05:00
Vijay Janapa Reddi	11f1771f17	Fix remaining critical issues in milestone READMEs Addressed 3 critical issues identified by education reviewer: 1. Standardized Module 07 terminology: - M03: Changed 'training loop' to 'end-to-end training loop' - Now consistent across all milestones (M01/M02/M03/M04) 2. Added quantitative loss criteria to M03: - TinyDigits: Loss < 0.5 (gives students measurable target) - MNIST: Loss < 0.2 (realistic threshold for convergence) - Fixed parameter count: ~2K → ~2.4K (accurate calculation) 3. Clarified M06 foundational dependencies: - Added note explaining Modules 01-13 are prerequisites - Makes clear the table shows ADDITIONAL optimization modules - Prevents confusion about complete dependency chain These fixes bring milestone READMEs to production-ready quality. Education reviewer grade: A- → A (after these fixes).	2025-11-11 13:12:23 -05:00
Vijay Janapa Reddi	775a40b08c	Fix remaining critical issues in milestone READMEs Addressed 3 critical issues identified by education reviewer: 1. Standardized Module 07 terminology: - M03: Changed 'training loop' to 'end-to-end training loop' - Now consistent across all milestones (M01/M02/M03/M04) 2. Added quantitative loss criteria to M03: - TinyDigits: Loss < 0.5 (gives students measurable target) - MNIST: Loss < 0.2 (realistic threshold for convergence) - Fixed parameter count: ~2K → ~2.4K (accurate calculation) 3. Clarified M06 foundational dependencies: - Added note explaining Modules 01-13 are prerequisites - Makes clear the table shows ADDITIONAL optimization modules - Prevents confusion about complete dependency chain These fixes bring milestone READMEs to production-ready quality. Education reviewer grade: A- → A (after these fixes).	2025-11-11 13:12:23 -05:00
Vijay Janapa Reddi	4653b5f808	Improve milestone READMEs based on education review feedback Applied Priority 1 critical fixes from education reviewer: 1. Fixed historical accuracy: - M01: Clarified Perceptron demonstrated 1957, published 1958 2. Improved module dependency clarity: - M01: Split requirements into Part 1 (Module 04) vs Part 2 (Module 07) - M02/M04: Added 'end-to-end' clarification for Module 07 (Training) - M04: Added missing Module 07 to dependency table 3. Added quantitative success metrics: - M02: Added loss values (~0.69 stuck vs → 0.0) - M04: Added training time estimates (5-7 min, 30-60 min) - M05: Replaced subjective 'coherent' with 'Loss < 1.5, sensible word choices' These changes address education reviewer's critical feedback about technical accuracy and measurable learning outcomes. Students now have clearer prerequisites and quantitative success criteria.	2025-11-11 12:56:39 -05:00
Vijay Janapa Reddi	5d2f6a5221	Improve milestone READMEs based on education review feedback Applied Priority 1 critical fixes from education reviewer: 1. Fixed historical accuracy: - M01: Clarified Perceptron demonstrated 1957, published 1958 2. Improved module dependency clarity: - M01: Split requirements into Part 1 (Module 04) vs Part 2 (Module 07) - M02/M04: Added 'end-to-end' clarification for Module 07 (Training) - M04: Added missing Module 07 to dependency table 3. Added quantitative success metrics: - M02: Added loss values (~0.69 stuck vs → 0.0) - M04: Added training time estimates (5-7 min, 30-60 min) - M05: Replaced subjective 'coherent' with 'Loss < 1.5, sensible word choices' These changes address education reviewer's critical feedback about technical accuracy and measurable learning outcomes. Students now have clearer prerequisites and quantitative success criteria.	2025-11-11 12:56:39 -05:00
Vijay Janapa Reddi	70f03f97ff	Add comprehensive README files for milestones 01-05 Created standardized milestone documentation following the M06 pattern: - M01 (1957 Perceptron): Forward pass vs trained model progression - M02 (1969 XOR): Crisis demonstration and multi-layer solution - M03 (1986 MLP): TinyDigits and MNIST hierarchical learning - M04 (1998 CNN): Spatial operations on digits and CIFAR-10 - M05 (2017 Transformer): Q&A and dialogue generation with attention Each README includes: - Historical context and significance - Required modules with clear dependencies - Milestone structure explaining each script's purpose - Expected results and performance metrics - Key learning objectives and conceptual insights - Running instructions with proper commands - Further reading references - Achievement unlocked summaries This establishes single source of truth for milestone documentation and provides students with comprehensive guides for each checkpoint.	2025-11-11 12:49:57 -05:00
Vijay Janapa Reddi	8191b8ebfc	Add comprehensive README files for milestones 01-05 Created standardized milestone documentation following the M06 pattern: - M01 (1957 Perceptron): Forward pass vs trained model progression - M02 (1969 XOR): Crisis demonstration and multi-layer solution - M03 (1986 MLP): TinyDigits and MNIST hierarchical learning - M04 (1998 CNN): Spatial operations on digits and CIFAR-10 - M05 (2017 Transformer): Q&A and dialogue generation with attention Each README includes: - Historical context and significance - Required modules with clear dependencies - Milestone structure explaining each script's purpose - Expected results and performance metrics - Key learning objectives and conceptual insights - Running instructions with proper commands - Further reading references - Achievement unlocked summaries This establishes single source of truth for milestone documentation and provides students with comprehensive guides for each checkpoint.	2025-11-11 12:49:57 -05:00
Vijay Janapa Reddi	c80b064a52	Create Milestone 06: MLPerf Optimization Era (2018) Reorganized optimization content into dedicated M06 milestone: Structure: - 01_baseline_profile.py: Profile transformer & establish metrics (moved from M05/03_vaswani_profile.py) - 02_compression.py: Quantization + pruning pipeline (placeholder) - 03_generation_opts.py: KV-cache + batching opts (placeholder) - README.md: Complete milestone documentation Historical Context: MLPerf (2018) represents the shift from "can we build it?" to "can we deploy it efficiently?" - systematic optimization as a discipline rather than ad-hoc performance hacks. Educational Flow: - M05 now focuses on building transformers (2 scripts) - M06 teaches production optimization (3 scripts) - Clear separation: model creation vs. model optimization Pedagogical Benefits: 1. Iterative optimization workflow (measure → optimize → validate) 2. Realistic production constraints (size, speed, accuracy) 3. Composition of techniques (quantization + pruning + caching) Placeholders await implementation of modules 15-18. Updated: - README.md: M05 reduced to 2 scripts, M06 described - M05 now ends after generation/dialogue - M06 begins systematic optimization journey	2025-11-11 12:32:27 -05:00

1 2 3 4 5

237 Commits