Cleaned up temporary AI-generated analysis files:
- modules/15_quantization/FIXES_APPLIED.md
- modules/15_quantization/FIXES_TO_APPLY.md
- modules/16_compression/FIXES_REQUIRED.md
- modules/17_memoization/FIXES_APPLIED.md
- Plus other untracked analysis files
These were temporary debugging/review artifacts. Now covered by
.gitignore patterns to prevent future accumulation.
Added module.yaml for Module 20 (Competition & Validation):
- Module configuration and learning objectives
- Prerequisites and skill development tracking
- Test coverage and connection documentation
This module brings together all optimization techniques learned
in modules 14-18 for competition preparation.
Added all module development files to modules/XX_name/ directories:
Module notebooks and scripts:
- 18 modules with .ipynb and .py files (01-20, excluding some gaps)
- Moved from modules/source/ to direct module directories
- Includes tensor, autograd, layers, transformers, optimization modules
Module README files:
- Added README.md for modules with additional documentation
- Complements ABOUT.md files added earlier
This completes the module restructuring:
- Before: modules/source/XX_name/*_dev.{py,ipynb}
- After: modules/XX_name/*_dev.{py,ipynb}
All development happens directly in numbered module directories now.
Documentation updates across the codebase:
Root documentation:
- README.md: Updated references from book/ to site/
- CONTRIBUTING.md: Updated build and workflow instructions
- .shared-ai-rules.md: Updated AI assistant rules for new structure
GitHub configuration:
- Issue templates updated for new module locations
- Workflow references updated from book/ to site/
docs/ updates:
- STUDENT_QUICKSTART.md: New paths and structure
- module-rules.md: Updated module development guidelines
- NBGrader documentation: Updated for module restructuring
- Archive documentation: Updated references
Module documentation:
- modules/17_memoization/README.md: Updated after reordering
All documentation now correctly references:
- site/ instead of book/
- modules/XX_name/ instead of modules/source/
Completed restructuring: modules/source/XX_name/ → modules/XX_name/
All module development files moved to their numbered directories:
- modules/01_tensor/tensor_dev.{py,ipynb}
- modules/02_activations/activations_dev.{py,ipynb}
- ... (modules 03-20)
Removed obsolete source structure:
- modules/source/01_tensor/ through modules/source/20_capstone/
- modules/source/20_competition/ (legacy competition module)
- 43 files total (21 modules × 2 files each + 1 module.yaml)
This simplifies the module structure and makes development files
easier to find alongside their ABOUT.md and README.md files.
- Delete kvcaching_dev.py (superseded by memoization_dev.py)
- Delete kvcaching_dev.ipynb (superseded by memoization_dev.ipynb)
- memoization_dev files are the current versions with complete content
Cleanup of renamed files:
- Deleted old module source files (14_kvcaching, 15_profiling, 16_acceleration, etc.)
- Deleted old chapter markdown files
- These have been replaced by reorganized versions in previous commits
- Shows O(n²) latency growth in transformer generation
- Demonstrates problem before teaching solution
- Prepares module for reorganization to Module 15
- Add quick_profile() for simplified profiling interface
- Add analyze_weight_distribution() for compression module
- Both functions will be used by modules 15-18
Complete capstone competition implementation:
- Two division tracks: Closed (optimize) and Open (innovate)
- Baseline CNN model for CIFAR-10
- Validation and submission generation system
- Integration with Module 19 normalized scoring
- Honor code and GitHub repo submission workflow
- Worked examples and student templates
Module 20 is now a pedagogically sound capstone that applies
all Optimization Tier techniques in a fair competition format.
Enhancements to benchmarking module:
- Added calculate_normalized_scores() for fair hardware comparison
- Implemented speedup, compression ratio, accuracy delta metrics
- Added MLPerf principles section to educational content
- Updated module to support competition fairness
These changes enable Module 20 competition to work across different hardware.
- Import calculate_normalized_scores from Module 19 for fair comparison
- Implement validate_submission() with sanity checks for submissions
- Check for reasonable speedup (<50x), compression (<32x), accuracy preservation
- Verify GitHub repo and required fields are present
- Update generate_submission() to use normalized MLPerf-style scoring
- Add division parameter for Closed/Open Division tracking
- Include github_repo and honor_code fields in submission
- Display normalized scores: speedup, compression ratio, accuracy delta
- Guide students to use 'tito submit' for final submission workflow
- Add Section 4.5: Normalized Metrics - Fair Comparison Across Different Hardware
- Implement calculate_normalized_scores() function for MLPerf-style relative metrics
- Calculate speedup, compression ratio, accuracy delta, and efficiency score
- Add comprehensive unit tests for normalized scoring
- Ensures fairness across different hardware by measuring relative improvements
- Prepares students for Module 20 TinyMLPerf competition submissions
- Updated module title to TorchPerf Olympics Preparation
- Added OlympicEvent enum with 5 competition categories
- Removed meta-analysis sections (532 lines)
- Added section 4.5 on combination strategies and ablation studies
- Updated documentation to explain Olympic events and optimization order
- Module teaches benchmarking principles while preparing students for capstone
- Updated all imports: ProfilerComplete → Profiler
- Updated Module 16: Uses Profiler for acceleration demos
- Updated Module 19: Uses Profiler in Benchmark class
- Updated all comments and docstrings
- Simpler, more professional naming (no awkward Complete suffix)
- Added import: from tinytorch.profiling.profiler import ProfilerComplete
- Benchmark class now initializes self.profiler = ProfilerComplete()
- run_latency_benchmark() uses profiler.measure_latency()
- run_memory_benchmark() uses profiler.measure_memory() and profiler.count_parameters()
- Updated architecture diagram to show ProfilerComplete as foundation
- Added pedagogical note explaining build-once-reuse-everywhere principle
Benefits:
- Eliminates code duplication between M15 and M19
- Shows proper systems architecture (composition/reuse)
- Students see ProfilerComplete tool evolving and being reused
- Clear separation: Profiler=measure, Benchmark=compare
- Removed SimpleOptimizer class (unused after mixed precision removal)
- Replaced trainer.train_step() test with simple forward pass test
- Test now validates accelerated operations without mixed precision
- Checks numerical correctness and reasonable output values