mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-03-11 17:49:25 -05:00

Files

Vijay Janapa Reddi 09de699545 fix(exports): add missing #| export directives across 10 modules

Systematic audit of all 20 modules against module-developer agent rules
found 9 standalone helper functions missing #| export — these are called
by exported code at runtime but were excluded from the generated package,
causing NameError/AttributeError in CI.

Modules fixed:
- 05_dataloader: _pad_image, _random_crop_region (used by RandomCrop)
- 06_autograd: _stable_softmax, _one_hot_encode (prior session)
- 07_optimizers: 5 mixin classes + monkey-patches (prior session)
- 08_training: 7 monkey-patched Trainer methods (prior session)
- 10_tokenization: _count_byte_pairs, _merge_pair (used by BPETokenizer)
- 11_embeddings: _compute_sinusoidal_table (prior session)
- 12_attention: _compute_attention_scores, _scale_scores, _apply_mask (prior)
- 15_quantization: _collect_layer_inputs, _quantize_single_layer (used by quantize_model)
- 18_memoization: _cached_generation_step, _create_cache_storage, _cached_attention_forward (used by enable_kv_cache)
- 19_benchmarking: rename TinyMLPerf→MLPerf, fix monkey-patch naming (prior)

Also includes: vscode-ext icon refactor (ThemeIcon migration).

All 789 tests pass (unit, integration, e2e, CLI).

2026-02-15 17:38:03 -05:00

01_optimization_olympics.py

fix(exports): add missing #| export directives across 10 modules

2026-02-15 17:38:03 -05:00

02_generation_speedup.py

feat: add step-by-step visualization to milestones

2026-02-03 19:42:21 +01:00

ABOUT.md

docs: use code block for module status check command

2025-12-21 18:33:41 -05:00

networks.py

fix(docs): standardize Perceptron year to 1958

2026-01-17 12:15:49 -05:00

README.md

docs: update milestones for acceleration/memoization reorder

2025-12-19 19:30:42 -05:00

README.md

Milestone 06: MLPerf - The Optimization Era (2018)

Historical Context

As ML models grew larger and deployment became critical, the community needed systematic optimization methodologies. MLCommons' MLPerf (2018) established standardized benchmarking and optimization workflows, shifting the focus from "can we build it?" to "can we deploy it efficiently?"

This milestone teaches production optimization - the systematic process of profiling, compressing, and accelerating models for real-world deployment.

What You're Building

A complete MLPerf-style optimization pipeline that takes YOUR networks from previous milestones and makes them production-ready!

Required Modules

Module	Component	What It Provides
Module 01-03	Tensor, Linear, ReLU	YOUR base components
Module 11	Embeddings	YOUR token embeddings
Module 12	Attention	YOUR multi-head attention
Module 14	Profiling	YOUR profiler for measurement
Module 15	Quantization	YOUR INT8/FP16 implementations
Module 16	Compression	YOUR pruning techniques
Module 17	Acceleration	YOUR vectorized operations

Milestone Structure

This milestone has two scripts, each covering different optimization techniques:

01_optimization_olympics.py

Purpose: Optimize static models (MLP, CNN)

Uses YOUR implementations:

Module 14 (Profiling): Measure parameters, latency, size
Module 15 (Quantization): FP32 → INT8 (4× compression)
Module 16 (Compression): Pruning (remove weights)

Networks from:

DigitMLP (Milestone 03)
SimpleCNN (Milestone 04)

02_generation_speedup.py

Purpose: Speed up Transformer generation

Uses YOUR implementations:

Module 11 (Embeddings): Token embeddings
Module 12 (Attention): Multi-head attention
Module 14 (Profiling): Measure speedup
Module 18 (KV Cache): Cache K,V for 6-10× speedup

Networks from:

MinimalTransformer (Milestone 05)

Expected Results

Static Model Optimization (01)

Optimization	Size	Accuracy	Notes
Baseline	100%	85-90%	Full precision
+ Quantization	25%	84-89%	INT8 weights
+ Pruning	12.5%	82-87%	50% weights removed

Generation Speedup (02)

Mode	Time/Token	Speedup
Without Cache	~10ms	1×
With KV Cache	~1ms	6-10×

Running the Milestone

# Optimize MLP/CNN (profiling + quantization + pruning)
python milestones/06_2018_mlperf/01_optimization_olympics.py

# Speed up Transformer generation (KV caching)
python milestones/06_2018_mlperf/02_generation_speedup.py

Or via tito:

tito milestone run 06

Key Learning

Unlike earlier milestones where you "build and run," optimization requires:

Measure (profile to find bottlenecks)
Optimize (apply targeted techniques)
Validate (check accuracy didn't degrade)
Repeat (iterate until deployment targets met)

This is ML systems engineering - the skill that ships products!

README.md Unescape Escape

Milestone 06: MLPerf - The Optimization Era (2018)

Historical Context

What You're Building

Required Modules

Milestone Structure

01_optimization_olympics.py

02_generation_speedup.py

Expected Results

Static Model Optimization (01)

Generation Speedup (02)

Running the Milestone

Key Learning

Further Reading

README.md