- Remove retired _archive/ and scripts/archive/ trees (site, book filters, games, vault); vault CHANGELOG points to git history for old scripts.
- CONTRIBUTING: site project row, site/ in area map, root vs TinyTorch pre-commit, vault schema drift wording.
- Newsletter CLI: path-agnostic news alias; tinytorch pre-commit comments; add tools/ and staffml-vault-types READMEs for maintainers.
User: 'whatever we did for the LLM and how you arrived at it, that's exactly
the same record that we should apply to these other ones.' — applying the
same press-ENTER-to-start pattern from Lander to every other game so the
READY-screen UX is consistent across the catalog.
Each game now:
- imports mountReadyOverlay from ./runtime.mjs
- declares a 'started' flag (false at init)
- mounts a per-game READY overlay with title, goal, controls, ENTER prompt
- gates ticker / onTick on started
- gates input handlers (keydown / pointerdown / cell handlers) on started
Per-game launch copy:
- batch: 'Push batch size up for throughput — but don't OOM.'
- allreduce: 'Tap each GPU on the beat. Keep gradients flowing.'
- moe: 'Route each colored token to the matching expert.'
- kvcache: 'Pack incoming requests. Beat fragmentation.'
- topology: 'Wire 8 GPUs. Maximize bandwidth, avoid bottlenecks.'
- checkpoint: 'Train fast. Checkpoint before a node failure strikes.'
- loader: 'Type each letter as it enters the zone — feed the GPU.'
- roofline: 'Stay under the cyan ceiling — that's your hardware roof.'
- prune: 'Cut faint weights to 60% sparsity. Keep accuracy above 50%.'
(also overrides prune's existing first-click-to-start in favor of READY)
- oom: 'Pack tensors into HBM. Activations dominate — that's why you'll OOM.'
- quantization: 'Drop weights below the bit budget. Hit the red center 7+ of 10.'
Verified: 14/14 games pass Playwright sweep — canvas mounted, runtime ready,
no JS errors, no 4xx network requests. Each game shows the READY overlay on
load and dismisses on Enter / Space / Up / tap.
User reported: 'cluster commander just doesn't do anything and the game is
broken, so it doesn't work. I click around, but nothing happens.'
Two layered bugs found via Playwright instrumentation:
1. Pixi v8 per-Graphics hit testing failed silently for the grid cells.
Hover (pointerover) worked but pointerdown never fired despite
eventMode='static' + explicit hitArea. handleGridClick was never reached.
Fix: bypass Pixi events for cells. Route pointermove/pointerdown through
canvas.addEventListener and translate (clientX, clientY) → grid (r, c).
Robust against any Pixi event-system quirks.
2. The new shared mountReadyOverlay was swallowing every click after dismissal.
In Pixi v8, visible=false does NOT stop event capture. The full-canvas
overlay's eventMode='static' kept catching pointerdowns even when invisible.
Fix in runtime.mjs: launch() now sets root.eventMode='none' AND removes
root from its parent, ensuring no further event capture.
Cluster also gets the standard READY overlay (press ENTER to launch) so the
player can read the goal before the timer starts.
Verified locally: Playwright headless, 3 mid-canvas clicks → Scheduled: 8,
multiple Fine-tune 2x2 blocks placed on grid, 'scheduled Fine-tune' float
text visible. Pre-fix: Scheduled stayed at 0 forever.
Three changes addressing user feedback on Pipeline Pacer:
1. Wider GPU stages (82px → 116px). Was leaving ~175px of dead space on the
right of the canvas; now spans 464px wide with ~50px margins matching the
left, filling the canvas evenly.
2. Smooth U-turn at GPU 3 instead of instant blue→brown color swap. New
600ms 'turning' state on each block: outCubic-eased RGB lerp from
compute-blue to routing-orange, plus a vertical dip (sin curve) that
visually traces the forward→backward U-turn. Stroke color also lerps
from $2f6e9d to $9a6620.
3. Pre-game READY overlay using the new shared mountReadyOverlay helper.
Game starts paused; player presses ENTER (or Space, or ↑, or taps) to
launch. Spawn function and ticker now gate on `started` so nothing fires
before the player commits.
Also: new mountReadyOverlay() helper added to runtime.mjs. Drop-in for any
game that wants the READY-press-ENTER pattern. Same dim overlay + title +
goal + controls + pulsing CTA + 'take your time' hint as Lander, with all
fields optional. Exported on window.MLSP.runtime.mountReadyOverlay.
User: 'the surface can be a bit more curved or more challenging - maybe levels?
and make hte global minium not always appear right in the middle'
Three changes to createTerrain + lossY:
1. Global pad spawns anywhere across the playable width (was: rand(0.42*W,
0.58*W) — always near center). Now: rand(0.18*W, 0.82*W). Forces actual
steering instead of drop-and-tap.
2. Local pads always flank global with a 110px center-to-center gap. If only
one half of the canvas has room (global near edge), both locals go there
separated within that half. Wells never bleed together.
3. New third sine harmonic in lossY (period ~Pi*7.3) adds finer surface
ripples that weren't in the original. Plus all three harmonic amplitudes
scale with day-based level (Day 1 = level 0 mild, Day 11+ = level 8 max).
4. dayChip now shows 'LVL N' so the player can read why today's terrain
feels harder than yesterday's.
Implicit progression: each day's puzzle is harder than the previous, capping
at level 8 so it never becomes impossible. The daily seed still produces
identical terrain for everyone playing on the same day.
The legacy .js game files (oom, prune, quantization, _archive/roofline)
displayed "alltime best" in their HUD text. Codespell flags this as a
misspelling of "all-time". The newer .mjs rewrites already use
"all-time best"; this aligns the older files with that convention.
Variable names (alltimeBest) and onGameOver/onScoreChange payload keys
({ alltimeBest: ... }) are unchanged — they're camelCase and already
ignored by the regex in pyproject.toml. Only display string literals
were touched, so no consumer breaks.
Unblocks codespell CI on dev, which has been red since the games-polish
loop landed.
The pixi-filters.min.mjs fix from 51ee2baf9 didn't propagate to the dev
preview deploy (deployed file still has the absolute path). This appends
a one-line comment to force git to track a content change so the next
deploy commits the file.
If this still doesn't propagate, the issue is in Quarto's resource-copy
step, not in the deploy script.
Post-merge Playwright sweep against the live dev preview revealed 4 games
still 404'ing on /assets/games/vendor/pixi.min.mjs (no path prefix). Trace:
the vendored pixi-filters.min.mjs ships with the absolute path
import {...} from "/assets/games/vendor/pixi.min.mjs"
baked into its bundle as the peer-dependency reference. This was hidden
behind the lazy import (only oom/prune/quantization/roofline call into
filters), which is why Lander and 9 other games passed.
Fix: sed-replace the absolute string with a sibling-relative path
'./pixi.min.mjs', which resolves correctly regardless of deploy base
since both vendor files live in the same directory.
Verified with full Playwright sweep against locally-rendered _build:
14/14 games now pass — canvas mounted, runtime ready, no errors, no
404s, including the four previously-broken filter-using games.
The same path-prefix bug that broke Lander on dev preview affected the other
13 games too. Fixing all of them in one batch so the entire catalog works
on /cs249r_book_dev/, mlsysbook.ai/, and localhost equally.
Pattern applied:
.qmd include-in-header script:
import "/assets/games/X.mjs" → import "../assets/games/X.mjs"
.mjs ES imports:
from "/assets/games/runtime.mjs" → from "./runtime.mjs"
from "/assets/games/vendor/pixi.min.mjs" → from "./vendor/pixi.min.mjs"
Files touched (10 .mjs + 13 .qmd):
.mjs: allreduce, batch, cluster, kvcache, moe, oom, pipeline, prune,
quantization, topology
.qmd: allreduce, batch, checkpoint, cluster, kvcache, loader, moe, oom,
pipeline, prune, quantization, roofline, topology
(checkpoint, loader, roofline .mjs already used 'import * as runtime from
./runtime.mjs' — only their qmd files needed updating)
Verification: all 14 games rendered locally (quarto render games/), served
via python3 -m http.server, swept with Playwright headless Chromium.
Result: 14/14 pass — canvas mounted, MLSP runtime ready, game registered,
no JS errors, no 4xx network requests. Visual screenshots confirm each
game's HUD/title/content paints correctly.
User reported the live dev preview was broken (blank canvas, 'doesn't do anything').
Playwright probe confirmed all .mjs imports 404'd:
[http 404] https://harvard-edge.github.io/assets/games/runtime.mjs
[http 404] https://harvard-edge.github.io/assets/games/lander.mjs
Root cause: dev preview lives at /cs249r_book_dev/ but every game imported
its modules via root-absolute paths (/assets/games/...). The dev-URL rewrite
script only handles https://mlsysbook.ai/... — not root-relative paths.
All 14 games have this bug; Lander is fixed here.
Path-prefix fix:
- lander.qmd: /assets/games/X.mjs → ../assets/games/X.mjs
- lander.mjs: /assets/games/runtime.mjs → ./runtime.mjs (sibling)
- lander.mjs: /assets/games/vendor/pixi.min.mjs → ./vendor/pixi.min.mjs
- runtime.mjs: /assets/games/vendor/pixi.min.mjs → ./vendor/pixi.min.mjs
- runtime.mjs: pixi-filters dynamic import → ./vendor/pixi-filters.min.mjs
UX feedback (bundled): user asked 'say hit enter to start so people don't
feel rushed and then they can read what's expected':
- READY CTA 'press UP to launch' → 'press ENTER to launch'
- Added italic 'Take your time — read the controls.' hint above the CTA
- Keydown accepts Enter, Space, OR ↑ as launch — any of the three works
- Center touch zone calls new shared launch() helper
- 'How to play' instructions updated to match
Verification: rendered locally (quarto render games/lander.qmd), served via
python3 -m http.server, probed with Playwright (headless Chromium). Page
loads, READY shows new CTA, Enter dismisses overlay, ↑ thrusts, crash
triggers per-failure aha card with correct share text. Zero console errors.
Outstanding: other 13 games still have the same path-prefix bug. Either
apply the same per-file fix, or extend rewrite-dev-urls.sh to also rewrite
/assets/... paths.
Cold re-read surfaced three small but real issues:
- dayChip claimed 'softest landing today: X m/s' — but X is screen-velocity
in pixels-per-frame, not m/s. Misleading scientific units.
- Retry button stayed MIT-red regardless of outcome — wins missed a small
visual reward.
- A few stale 'fuel' comments left over from the Lunar Lander origin.
Changes:
- dayChip: 'softest landing today: v=1.42' (dimensionless, matches share format).
Empty-state: 'land softer than yesterday' (implies cross-day comparison).
- Retry pill recolors green ('↺ PLAY AGAIN') on win, stays MIT-red ('↺ TRY AGAIN')
on any loss
- Comment sweep
Loop complete. 10/10 iterations on feat/games-polish-loop. See
.claude/_reviews/games-polish-loop-lander.md for the full log + reusable
template for the other 13 games.
The game was unplayable on a phone (keyboard-only controls). Animations
ignored OS reduce-motion preference. No aria announcement for game-over.
- Three invisible Pixi touch zones: left ⅓ steer-left, right ⅓ steer-right,
center ⅓ thrust + tap-to-launch from READY screen
- Pointer fallbacks (pointerupoutside, pointercancel) prevent stuck-key state
- Z-order re-pinned after touch-zone creation so retry pill stays clickable
- reduceMotion flag from matchMedia('(prefers-reduced-motion: reduce)')
- safeShake + safeBurst wrappers gate all 5 shakes and 4 big crash bursts
- Goal-pad pulse and CTA pulse held static when reduce-motion is set
- New aria-live='assertive' span; onGameOver writes per-reason announcement
- 'How to play' gains a mobile/tablet bullet
Lens-bounded change. Iter 10 (final ship-readiness pass) is next.
The landing page lagged the in-canvas state machine. DOM HUD duplicated VRAM
and Speed values now drawn in-canvas. 'How to play' didn't mention READY
screen, RETRY button, daily seed, trajectory marker, or altitude line.
'The Systems Concept' only framed the win path.
- DOM HUD trimmed to a key-cap controls row; VRAM/Speed values moved to
.mlsp-sr-only aria-live span (kept for AT, hidden visually)
- 'How to play' rewritten to match current state (READY-to-launch, all six
affordances called out)
- 'The Systems Concept' lists all five failure modes mapped to real training:
diverged, local-min, missed-basin, off-course, OOM
- Tail line invites the other 13 games
- New shared CSS: .mlsp-controls-line (key-cap row), .mlsp-sr-only (standard
visually-hidden pattern); reusable across the catalog
Lens-bounded change. Iter 9 (mobile/touch/a11y) is next.
Lander was missing every replay-loop affordance the other 13 games have:
no daily seed, no best-score persistence, no visible retry button, no share
artifact, dead-static game-over state.
- dailySeed('lander') → terrain RNG; same loss surface worldwide today
- bestScore integration: lowest impact speed stored per-day; top-center chip
shows 'Day #N · your softest landing today: X m/s'
- In-canvas RETRY pill (Pixi Container, eventMode=static), MIT-red background,
visible only after gameOverFired
- buildShareText(state) per outcome: emoji-grid lines for win + 5 failure modes,
with ⭐ new personal best marker
- attachShareRow in lander.qmd appends share text + 📋 copy button to aha card,
with success feedback (✅ copied → reverts)
- New shared CSS in common.css: .mlsp-share-row, .mlsp-share-text, .mlsp-share-btn
(reusable by every other game)
Lens-bounded change. Iter 8 (landing page) is next.
The game functioned but didn't feel presentation-bar. Ship was a tiny unstroked
triangle (protagonist of the screen, visually forgettable). Only the left
local pad was labeled. No altitude cue. The 'stochastic gradient noise' label
floated orphaned in the top-left.
- Ship rebuilt as layered Graphics: halo + body + crisp stroke + interior highlight
- Flame layered into outer glow + brighter core ($0xffd28a)
- Right local pad now labeled symmetrically with the left
- New altitude-reference dashed line from ship to the surface beneath
(only drawn when headroom > 18px so it doesn't crowd touchdown)
- 'stochastic gradient noise' → 'loss landscape', repositioned at the basin
wash so it reads as chart annotation, not free-floating decoration
Lens-bounded change. Iter 7 (replay loop & shareability) is next.
Six possible outcomes, but all produced the identical aha card. The OOM event
didn't even end the game — flashed text once and let play continue. And
'GRADIENT EXPLOSION' misuses ML terminology (gradient explosion is NaN
propagation, not landing in the wrong place).
- state.reason tracks which failure occurred ('win'|'diverged'|'local-min'|
'off-course'|'missed-basin'|'oom')
- 'GRADIENT EXPLOSION' → 'MISSED THE BASIN' (accurate to loss-surface metaphor)
- OOM now properly ends the game with full juice (matches real training)
- New AHA[reason] map: six distinct messages, each mapping the failure to its
real ML systems counterpart
- api.aha(reason) returns the right card; lander.qmd attachAha consumes it
- Bug fix: state.gameOverFired guard so onGameOver fires once, not every frame
Lens-bounded change. Iter 6 (visual polish) is next.
The HUD lived in a DOM strip below the canvas. Player couldn't read VRAM
and watch the ship simultaneously without an eye-flick worth ~200ms — long
enough to crash. Both VRAM and speed were text-only.
- VRAM vertical bar (top-right), color shifts blue → orange → red as memory depletes
- Descent-speed horizontal bar (bottom-left) with explicit green safe-zone,
red danger-zone, and 'soft-landing limit ↑' threshold marker
- Both bars drawn every frame at top of ticker (never stale during pre-game / post-crash)
- DOM HUD retained for a11y / screen readers; iter 8 will trim its copy
Lens-bounded change. Iter 5 (failure-state pedagogy) is next.
Brutal first-experience. Rotation was 0.08 rad/frame (~4.8 rad/s — a 0.3s tap
rotates ~80°). Terrain slope randomized over [-8, +8] producing run-to-run
variance unrelated to player skill. No predictive feedback whatsoever.
- rotSpeed 0.08 → 0.055 (precision without sluggishness)
- terrain.slope range [-8, +8] → [-4, +4] (less luck, same lesson)
- Soft-landing thresholds: speed 2.0 → 2.4, angle 0.5 → 0.6 rad
- New translucent trajectory marker (30 frames coast-ahead) + faint trail line
- Promoted maxSafeSpeed and maxSafeAngle to named constants for future tuning
Lens-bounded change. Iter 4 (HUD legibility) is next.
Inputs and outcomes weren't punching. Thrust looked the same regardless of
context, every crash was identical, wins were anemic, and off-screen exits
ended the game silently.
- Import flash + shake from runtime
- Thrust emits back-puff particles opposite ship rotation (~every 2 frames)
- Camera shake on crash scales with impact speed (min(18, speed * 3))
- Win → green burst + green flash + small confirming shake
- Off-screen exit is now an explicit OFF COURSE failure with red flash
Lens-bounded change. Iter 3 (difficulty curve) is next.
A first-time visitor used to crash before they could read the controls. The
ship started falling the instant the page loaded (~2 s to ground at gravity
0.05). The 'global minimum' label was a small text element visually equivalent
to the local-minima labels.
- Add state.started flag; physics paused until first ↑ press
- Full-canvas READY overlay: title + 1-line goal + controls + pulsing CTA
- Persistent breathing pulse on the green pad so the eye finds it first
Lens-bounded change. Iter 2 (game feel) is next.
Improve the playground shell, game explanations, and several mini-game interactions so the games read more clearly as teaching artifacts and work better in fullscreen.
- Added 'KV Cache Packer' to teach PagedAttention and KV fragmentation
- Added 'Cluster Commander' to teach Slurm scheduling and fleet fragmentation
- Registered all 14 games in the runtime registry
- Fixed WebGL rendering loops to avoid performance overhead and crashes
- Updated 404 pages across all workspaces to route to the new games Playground
- Overrode default Quarto 'S' search shortcut to Shift+? to free up typing controls
Two threads landing together:
PixiJS migration (Pulse Prune + OOM):
- Vendored PixiJS v8 + pixi-filters v6 to site/assets/games/vendor/
(pixi-filters bundle patched to resolve "pixi.js" import locally)
- Added shared Pixi runtime (runtime.mjs): mountPixiOnCanvas, pop/flash/burst/
floatText/shake juice helpers, tween + lazy getFilters, daily seed + best
score; legacy window.MLSP bridge preserved for canvas-2D games
- Pulse Prune rewritten on Pixi (prune.mjs v8): GlowFilter on inference pulse
+ 18-position trail, dense bursts and bloom rings on critical cuts/win
- OOM rewritten on Pixi (oom.mjs v6) with pedagogy fixes from audit:
KV cache replaced with parameters block (training game, not inference);
block ratios rescaled to real HBM proportions (act=6, opt=6, params=3, grad=1);
activation freeing flipped FIFO -> LIFO (autograd traversal order);
aha card cites Korthikanti et al. 2022; visual lift via tween-in placement,
ambient particles at >=40% fill, GlowFilter on params + win state
Straggler (new game #4):
- New game on Pixi teaching distributed-training tail latency and ring all-reduce
- 8 GPU-creature sprite cast (idle/waiting/sighing/ash) generated as iconic
vector mascots, transparency-cleaned and sized for runtime in sprites/
- Synchronous-ring rhythm mechanic: tap each round to advance the ring; miss
the rhythm and the cluster stalls -> non-player GPUs accumulate idle time;
4.5s stall throttles the player -> game over
- 4-second grace period auto-passes (tutorial), round duration ramps from
2.4s -> 0.7s over 60s, phase-2 gather pass spawns at t=30s
- HUD reports steps + cluster idle %; aha card cites Sergeev & Del Balso 2018
Catalogue cleanup:
- Roofline Runner archived to site/games/_archive/ and site/assets/games/_archive/
(kept in registry.js with available:false rather than deleted)
- Gallery (index.qmd) updated: 4 cards = Pulse Prune, Straggler, OOM, Sharp Shot
- Brand sweep "MLSys Playground" -> "MLSysBook Playground" across qmd footers,
registry header, common.css comment, 404 aria-label, and all .js share-text
Tooling:
- pre-commit codespell: skip site/assets/games/vendor/ (third-party minified
bundles); fix "alltime" -> "all-time" in user-visible strings; rename
fromI/toI -> fromIdx/toIdx in straggler ring math
Tested: all 4 games load with zero JS errors in headless Chromium.
Brings the TinyTorch lab guide's Quarto project in line with
book/quarto/, the only other in-tree Quarto publication that builds
both web and PDF outputs from a single source. The previous name had
three redundancies:
- already under tinytorch/, so "site-" prefix wasn't disambiguating
- also produces the PDF lab guide, so "site-" was misleading
- the top-level site/ dir made "site-quarto" read as "the site's
quarto config" rather than "the tinytorch site, in quarto"
After this rename the convention is straightforward:
book/quarto/ -> the textbook (web + PDF)
tinytorch/quarto/ -> the TinyTorch lab guide (web + PDF)
mlsysim/docs/ -> mlsysim API reference (kept as docs/, since it
really is API reference, not a publication)
Touches 7 GitHub workflows, both .gitignore files, the rename target's
own self-references (Makefile, _quarto.yml configs, STYLE.md,
measure-pdf-images.py), and 6 copies of subscribe-modal.js plus a few
shared scripts/configs whose comments documented the old path.
Verified: rebuilt pdf/TinyTorch-Guide.pdf (2.1M) cleanly from the new
location with 'make pdf' from tinytorch/quarto/.
Three redesigns shipped together after a round-6 consultation (indie
designer + Song Han + beginner-player). Each addresses a specific
complaint from playtests:
1. OOM step-bar driven by PACKING, not a wall clock
Old: stepCountdown ticked down 7s, independent of what the player
did. User feedback: 'I don't know what the step is doing. Is it
a timer?' — all three reviewers flagged the missing causal chain.
New: state.stepProgress increments on every block placed. Every
6 blocks fires step() (clears gradients). Every 3 blocks fires
backward (consumes oldest activations). The right-side bar now
*fills* as you pack, not counts down. Causation is visible: your
packing IS the training loop.
2. Pulse Prune simplification + visible sparsity goal bar
Old aha text overclaimed (name-dropped lottery-ticket, 2:4 sparsity
which the game doesn't model). Emma-the-beginner also flagged the
win condition wasn't legible — 'snip things and hope?'.
New aha text is one honest sentence: 'Magnitude is a usable proxy
for importance (Han et al. 2015). Real pruning adds a fine-tuning
step to recover accuracy — you just did the cut.' Song Han's
exact recommendation.
HUD now has a visible sparsity progress bar (green, fills toward
the 60% goal with a target tick at the end) above the accuracy
bar. The goal is a bar you fill, not a number you infer.
3. Sharp Shot replaces Quantization Cliff
Old: static precision-dial form. Reviewers called it a spreadsheet.
Beginner said she'd bounce off it. Game designer: 'barely a game
— three clicks and a deploy button.'
New: target-shooting game. Per-layer precision dials on the left
panel; target downrange. Song Han's per-layer visual mapping:
- edge layers (embedding, output) at int4 → POSITION DRIFT
(target actually moves away from where you aim, teaching the
LLM.int8 edge-layer cliff; Dettmers 2022)
- attention layers at low precision → JITTER (softmax noise)
- FFN layers at low precision → BLUR (contrast loss, tolerant)
10 shots per round, hit 7 to ship. When you miss, the game briefly
reveals the true target position in dashed red so you can see how
far your sight was misaligned — pedagogy through failure.
Controls: mouse or arrow keys to aim, space / click to fire, 1-6
keys cycle layer precision. Immediate visual feedback (no deploy
phase; the picture updates as you dial).
4. Naming
Quantization Cliff → Sharp Shot on the gallery card, the
registry, the page title, and inside the game. 'Quantization'
stays in the aha card and page prose (that's where it belongs),
but beginners who bounce on jargon won't bounce on 'Sharp Shot.'
Emma's recommendation.
Files: 3 game .js files rewritten, registry.js updated for rename,
index.qmd gallery card updated, quantization.qmd page prose rewritten
for the new mechanic. Previous Roofline Climber redesign deferred
(indie designer flagged it as three genres stacked; current Roofline
works and rewriting it risked losing more than gaining).
The two Iteration 4 edits that failed to apply in commit 14878c395 (string
match regressions after earlier writes). Applying now:
- Roofline: catch resolution now fires only on the frame a kernel
crosses its targetX — strict, single-decision. Replaces the old
24px check window where the catcher could drift into position
over multiple frames. Honors the ghost reticle's 'land here'
promise.
- Pulse Prune aha card: now explicitly names what the game skips
(fine-tuning recovery step + structured sparsity required for
real GPU acceleration). Prevents the 'small weights are free
wins' misconception that education reviewer flagged.
Iteration 4 is now complete. Catalogue ships as-is.
Independent review + ECE-hardware lens + education reviewer all
converged on a small set of precision-level fixes. This commit ships
them and declares the iteration loop converged.
1. Roofline catch window fix
Independent reviewer spotted that the catch check ran every
frame while a kernel was inside a 24px window (targetX+16 to
targetX-8), meaning the catcher could drift into position
over multiple frames and still register a catch. The ghost
reticle promised 'land here'; the code didn't enforce it.
New logic resolves the catch atomically when the kernel
crosses its targetX (single frame, single decision). Either
you're at the reticle when the kernel arrives, or you miss.
Honors the visual promise.
2. Prune threshold alignment
Penalty fired at magnitude > 0.45 but visual tier rendered
'solid blue' only above 0.55. Players clicking a medium-
visible weight got punished without warning — 'that wasn't
fair' response in reviewer notes.
Changed isBright threshold to > 0.55 so the visual cue and
the penalty threshold match. What looks bright IS bright.
3. Prune aha card: explicit what-the-game-skips caveat
Education reviewer: 'the aha card should explicitly say real
pruning requires fine-tuning after — the game skips this, to
prevent the wrong mental model.' Added that plus a note
about 2:4 structured sparsity being required for actual GPU
acceleration (per Lisa's hardware review).
Now the card is honest about what the player learned vs what
they didn't.
4. Quantization Cliff: actually cliffs
Independent reviewer: the game is called 'Cliff' but drops
are purely linear — there's no moment where accuracy falls
off. Either rename or add a nonlinearity.
Added nonlinear penalty: int4 on an edge layer (first or
last) triggers a 1.8× drop multiplier. Now the game has an
actual cliff — and this matches real quantization behavior
better (embedding and output heads collapse hard at very low
precision, which is why LLM.int8 and AWQ keep them higher).
Convergence declaration:
Round 1 reviewers found 6 critical architectural issues.
Round 2 reviewers found 4 design-level issues.
Round 3 reviewers found 4 precision-level issues.
Each iteration caught smaller, more local issues than the last
— the classic pattern of diminishing returns. Stopping here is
the right call.
Iteration 2 surfaced four fresh critiques (viral, juice, accessibility,
academic). This commit ships the highest-leverage fixes from those four,
keeping mobile + accessibility for a future pass.
1. Shared juice module (common.js)
Adds MLSP.pop, MLSP.flash, MLSP.tickJuice, MLSP.drawJuice, plus
easeOutCubic and easeOutBack curves. Pop = expanding ring on
score events; flash = full-screen tinted wash on catastrophes.
Wired into all four games via 4 lines apiece.
Also adds MLSP.dayNumber() — days since 2026-04-22, used for the
'Day N' framing in share text (Wordle pattern).
2. Emoji-grid share artifacts in all four games
Critical viral fix from round 2 — text-only shares don't escape
ML-Twitter, but emoji grids spread.
- Pulse Prune: 4×6 grid of input → hidden weights, with pruned
(⬛) / kept-bright (🟦) / kept-dim (🟩) / critical-mistake (🟥).
- Roofline Runner: 10-cell histogram of last 10 catch outcomes
(🟦 caught-good, 🟥 above-ceiling, ⬛ missed).
- OOM: 8×4 sampled HBM map at game-over (🟦 act, 🟥 grad, 🟧 opt,
🟩 KV, ⬛ empty) — captures the visually distinctive game state.
- Quantization Cliff: 6-emoji precision ladder (🟦 fp32 / 🟩 fp16
/ 🟧 int8 / 🟥 int4) — the decision space made shareable.
3. Quantization Cliff: deploy stagger reveal
Old behavior: deploy() revealed all 6 layer accuracy drops
simultaneously. Round-2 juice review called this 'the flattest
game by an order of magnitude.'
New: reveal staggers per-layer at 150ms intervals, each with a
pop ring colored by the drop magnitude. Deploy button shows
'deploying…' and is disabled during the reveal sequence. Final
accuracy shown only after the last layer reveals, with a
full-screen flash (green if shipped, red if off-spec).
4. OOM: massively louder STEP() event
Round-2 juice review: 'fireStepEvent fires burst(... 3) per cell
— three particles per cell is criminally undersized for the
marquee event.'
New step event triggers a 360ms full-screen green flash, 8
particles per freed cell (was 3), AND a pop ring per freed cell.
Backward event gets its own blue flash + pops. Game-over
overflow gets red flash. The signature beat is now signature.
5. Aha card factual fixes (per academic review)
- Pulse Prune: Han et al. (2015) cited as primary for magnitude
pruning; LTH framed correctly as a separate further claim.
- Roofline Runner: Williams, Waterman, Patterson 2009 cited;
attention correctly described as spanning compute-/memory-bound
depending on sequence length and prefill-vs-decode phase.
- OOM: training vs inference regimes correctly distinguished —
optimizer state lives during training, KV cache during
inference; they don't typically coexist.
- Quantization Cliff: HAQ (Wang 2019) and HAWQ (Dong 2019) cited
as the actual per-layer bit-allocation methods. AWQ and GPTQ
reframed correctly as uniform-precision techniques (they
minimize accuracy cost AT a chosen precision, not allocate
bits across precisions).
6. Page prose updates
- games/quantization.qmd: prose updated to match new aha card,
distinguishing bit-allocation vs uniform-precision techniques.
- games/prune.qmd: Han et al. (2015) added as the primary
citation alongside the LTH reference.
Skipped this round (next iteration if needed):
- Mobile redesign (Pulse Prune touch radius, OOM cell size on
small screens) — accessibility review identified these but they
require layout work, not surgical edits.
- Reduced-motion media-query check inside canvas games.
- Today's-puzzle banner on the gallery page.
Files: 5 game .js files, 2 page .qmd files.
Five-reviewer playtest converged on three fixes that everyone agreed
needed to land before this catalogue ships. This commit applies all
three plus the daily-seed parity fix the author flagged.
1. OOM mechanic rewrite: lifetime-driven freeing, not Tetris row-clear
Three reviewers (author, production-engineer, hardware) independently
said the row-clear mechanic teaches a falsehood — real GPU
allocators do not compact, they return blocks to a free-list.
Filling a contiguous span does not free memory.
New mechanic ties freeing to ML semantics:
- STEP! event fires every 7s: every red gradient block clears
simultaneously (mimics .step() releasing gradients), bonus
score per gradient freed.
- BACKWARD event fires every 4.5s: oldest 1-2 activation
blocks dissolve (mimics activations consumed during backward).
- Optimizer state and KV cache persist across the run.
Visual phase indicator at top of canvas shows current phase
(FORWARD / BACKWARD / STEP). Right-side countdown bar shows
time-to-next-step. The satisfying clearing moment is preserved
but now teaches the real lifetime-driven memory pattern.
2. Roofline op intensity bands
Hardware reviewer flagged: kernel y-positions were random
independent of op type, so GEMM could spawn at low intensity.
This taught the wrong intuition (kernel position is arbitrary)
when it should teach the opposite (op type determines intensity).
Each op now has a fixed intensity band: GEMM 0.65-0.85
(compute-bound), conv 0.50-0.65, attn 0.35-0.50, gelu 0.18-0.28,
layernorm 0.12-0.20, softmax 0.10-0.18, elem 0.05-0.12. Y is
still randomised within above/below ceiling.
Plus David's predictive-landing-reticle suggestion: a dotted
ghost circle at each kernel's landing point shows where to be
before the kernel arrives. Turns reaction into anticipation.
Plus a defensive fix: the hit-detection window now spans
+-16 to +-8 px around the target rather than 'crossed targetX
this frame', so dt spikes can't teleport kernels past their
hit window.
3. Quantization Cliff: deterministic accuracy
Gamer reviewer flagged: '(3 + rand() * 2)' in the accuracy
calculation made the same configuration produce different
results across deploys. Players literally could not learn
the system. That is noise hiding the lesson.
Replaced the random multiplier with a constant. Sensitivity
is still hidden and seeded once per day, but for a given
configuration the result is deterministic. Game becomes a
real puzzle.
4. Daily seed parity
Author flagged: only Pulse Prune was using a seeded PRNG.
'Daily seed' is a brand commitment; 3 of 4 games defaulting
to Math.random was inconsistent. All four games now seed a
mulberry32 PRNG from today's date.
5. Aha card factual edits (per production-engineer review)
- Prune: '5-20x' replaced with 'substantially shrink parameter
counts (combined with quantization and distillation)' since
the larger multipliers conflate techniques.
- OOM: full rewrite to describe lifetime-driven memory, with
fragmentation correctly framed as a failure mode, not a
technique.
- Roofline: added that real engineering raises a kernel's
intensity (fusion, tiling) rather than catching what falls.
- Quantization: SmoothQuant replaced with AWQ/GPTQ — those
are the actual bit-allocation search techniques.
6. Gallery taglines: verb-first, beginner-friendly
Beginner reviewer: 'lead with the verb and the human stakes,
not the concept name.' Updated all four card taglines on
/games/ from concept-name-first to action-first prose.
Files: 4 game .js, 1 gallery .qmd. Static server in _build/ already
shows the new code; just refresh.
User direction reset: 'I don't think this game is accurately correct,
in fact. It's very confusing. I think we're making it way too
complicated. It should be something fun. Pruning is good; roof line
is good. Let's build up those other games as well.'
This commit pivots the entire playground from simulation-grade to
arcade-grade, and ships four games at once:
1. PULSE PRUNE — simplified. Removes all the fine-tune dynamics,
ceiling/staleness, patience, rewiring, daily-seed complexity.
Core loop is now a 45-second timer: click dim weights, keep
accuracy above 50%, hit 60% sparsity to win. Same ML-themed
visuals (network, inference pulse, hover tooltips) but no
simulation. 300 lines, was 650.
2. ROOFLINE RUNNER — new. Log-log chart drawn on canvas with the
classic bent roofline ceiling. Kernels (GEMM, attn, softmax,
etc.) fly in from the right at various heights. Player moves
a crosshair with mouse or arrow keys. Catch kernels BELOW the
ceiling for +1; kernels ABOVE the ceiling are -1 (unrealisable
throughput). Three lives. 30-second rounds.
3. OOM — new. Tensor Tetris. Activations (blue, wide), gradients
(red, narrow), optimizer states (orange square), KV cache
(green, tall) fall into a bounded HBM region. Move with arrow
keys; space for hard drop. Complete rows free memory
(allocator reclaim metaphor). Score = blocks placed before
overflow.
4. QUANTIZATION CLIFF — new. Six layers stacked vertically, each
with a precision dial (fp32 / fp16 / int8 / int4). Bit budget
is 96 (half of 6 × 32). Click a layer to cycle its precision.
Press deploy to reveal accuracy — but you get only THREE
deploys per run. Sensitivity is hidden and uneven: first and
last layers (embeddings, output) hate low precision; middle
layers tolerate int4 fine. Hit 85% accuracy within budget to
ship.
Infrastructure:
- site/assets/games/{prune,roofline,oom,quantization}.js: four
self-contained game modules. Each ~200-300 lines.
- site/assets/games/registry.js: all four games marked available.
404 randomizer now rotates across all four.
- site/games/{prune,roofline,oom,quantization}.qmd: standalone
pages per game.
- site/games/index.qmd: gallery updated to show all four live.
Design principle swap:
- OLD: 'Feel the constraint' / 'subtraction under scarcity' —
pedagogical thesis demanding simulation-grade depth.
- NEW: fun first, ML aesthetic as flavor, teaching as a bonus in
the aha card. Each game is an arcade mechanic with an ML theme
painted on, not a simulator pretending to be a game.
User feedback: 'I don't understand what the point of the game is. Logic
seems to work, but it's not intuitive. R doesn't actually retry. The trick
is claimed in the aha card but never visualised.'
Four fixes shipped together:
1. VISIBLE TARGET (60% sparsity goal)
- Subtitle now reads 'goal: reach 60% sparsity without accuracy
collapsing' so the objective is legible from second one.
- HUD shows 'sparsity 42% / 60% goal' with a 🏆 once cleared.
- Crossing the threshold triggers a celebratory burst from the
output neurons + a green banner ('🏆 target reached — keep going
for bonus sparsity') that fades after ~3s.
- Game-over now distinguishes 'goal cleared' (green) vs 'accuracy
collapsed' (red) and explicitly says target reached/missed.
- Share text reflects whether the daily target was cleared.
2. REWIRING VISUALISATION
- On every prune, the sibling weights (those sharing an endpoint
with the cut weight) immediately brighten and have their drift
target raised. Over the next fine-tune tick they visibly thicken.
- The player now SEES the network redistribute capacity — the
'lottery-ticket / network rewires' claim is no longer just text
in the aha card, it's a live visual after every cut.
3. R RETRIES ANY TIME
- Previously: keydown handler had which
silently swallowed R presses during play.
- Now: R always triggers retry as long as the canvas is in
viewport. Enter/Space still only retry on game-over (so they
don't hijack page scroll mid-game).
- HUD now also surfaces 'press R to retry' as a footer hint.
4. 404 BANNER
- Prominent banner at top of /404.html: '404 — you arrived here
because the page is missing. The URL may have moved, or it never
existed. Enjoy a game while you figure out where you were going.'
- Inline recovery nav directly under the banner (home / Vol I /
Vol II / more games / report broken link) so confused readers
can recover in one click without scrolling past the game.
- Banner is 404-specific styling (red border-left, white card on
neutral background) — does not appear on /games/prune/.
Files:
- site/404.qmd: banner + inline recovery nav
- site/assets/games/common.css: .mlsp-404-banner + .mlsp-404-nav styles
- site/assets/games/prune.js: target system, rewiring loop, R-any-time
Two threads landed in this commit: the bugs the user reported
(pulse-routing + pacing) and the design-level fixes from a
4-reviewer synthesis (deep-mechanic, meaningful-interaction,
viral browser-game, friction-as-feature lenses).
Bug fixes (the things you saw):
- Pulse now routes through the real graph. Previously the
inference pulse was drawn as a straight line from a random
input to a random output, ignoring topology — pruning had
real scoring effects but no visible inference consequence.
Now spawnPulse picks a concrete path via pickOutgoingUnpruned:
input -> unpruned weight -> hidden -> unpruned weight -> output.
If a leg has no unpruned outgoing weight, the pulse dies at
that node with a visible dropped marker and tiny screenshake.
Cut every outgoing weight from input-2 and you literally watch
inferences from input-2 die at the source.
- Slowed pacing. One pulse at a time, 850ms per leg, 500ms gap.
Reduced fine-tune jitter (0.015 -> 0.006) so weight thicknesses
stop flickering. Slowed activation glow (0.82 -> 0.92) and
neuron pulse decay (0.88 -> 0.94).
- Activations glow only on the chosen path edges, not all weights
in the current layer. Screen is dramatically less noisy.
Design-level fixes:
- HUD stripped. On-canvas HUD is now one accuracy bar (with
ceiling tick + red dashed floor), sparsity %, and the daily
seed line. Combo counter and classification tally are gone
from the canvas surface — they live in the score model and
appear only on game-over and in the share artifact. (Two
reviewers independently flagged 'too many numbers competing.')
- Combo -> Confidence redesign. Old combo rewarded fast clicking
on dim weights (tick-gaming). New patience system rewards how
long the weight stayed dim BEFORE you cut it: +5 patience for
weights small for 5+ ticks, +0 hasty for fresh dips,
-load-bearing flag with screenshake for high-importance cuts.
- 3-second ghost demo. Game opens by playing itself: small red
ghost cursor slides toward the smallest-magnitude weight,
the weight pulses, the cursor clicks it, then a 'your turn'
banner appears. First user pointerdown takes control.
- Failure-as-content. Game-over reveals the network skeleton
of your destruction: pruned weights drawn as red dashed lines
over the surviving graph, load-bearing mistakes drawn thicker
in MIT red. Translucent overlay lets the corpse ghost through.
- Emoji-grid share artifact (Wordle pattern). Share text now
embeds a 5x8 grid of the input-hidden weight matrix:
blue square = kept high-magnitude, green = kept low-magnitude,
black = pruned cleanly, red = pruned but load-bearing
(a mistake). Visually recognisable on social — the single
highest-leverage fix from the viral-browser-games reviewer.
Plan document for the full design synthesis (the four reviewer
perspectives, the new 'subtraction under scarcity' thesis, the
revised game-4 lineup) lives at .claude/_reviews/mlsys-playground-plan-2026-04-22.md
locally. M2 (Roofline reimagined as architecture-not-reflex),
M3 (OOM with mercy moments), and M4 (Quantization Cliff,
commit-based) wait on user signoff.
v1 was a one-sided puzzle — the player cut weights, nothing pushed back.
Three parallel design reviews (lab-designer, Song Han as Prune's original
author, and a gamer gut-check) converged on the same fix: give the
network its own heartbeat so pruning becomes reactive instead of
contemplative.
v2 runs three simultaneous beats:
- Fine-tune tick (~2 Hz): weight magnitudes drift each tick toward
importance-weighted targets, surviving weights strengthen as
sparsity grows, the accuracy ceiling reflects current capacity with
a concave falloff, and staleness penalises sitting idle. This makes
the lottery-ticket intuition — the network rewires around your
cuts — physically felt rather than explained.
- Inference pulse (~1.1 Hz): a sample sprite flows left-to-right
through the network, activates edges along its path (which glow
green), pulses the source and target neurons, and ticks the
correct/attempted counter at the output. Misclassifications cause
a small accuracy nudge and screenshake.
- Player cuts: clicking a weight that has stayed small across
N consecutive ticks scores a combo multiplier; cutting a high-
importance weight triggers screenshake, a red particle burst, and
resets the combo.
Added retention mechanics:
- Daily seed: everyone playing today gets the same network, generated
by mulberry32 seeded from the ISO date. Daily best persists until
tomorrow's puzzle replaces it.
- Share button on the aha card copies a score summary to clipboard.
- Alltime best persists separately from daily best.
Added game feel:
- Screenshake on bad cuts and misclassifications.
- Particle bursts (green for clean cuts, red for critical cuts).
- Floating score texts rise from cut points.
- Edges glow when carrying an inference pulse.
- Neurons pulse with red rings when activated by a sample.
Pedagogy change: the aha card now teaches iterative pruning and gradual
magnitude pruning (Zhu & Gupta 2017) plus lottery-ticket intuition
(Frankle & Carbin 2018), because players now feel those phenomena
rather than the simpler 'most weights are small' observation of v1.
Files:
- site/assets/games/prune.js: full rewrite, 9.5K → 24K.
- site/games/prune.qmd: updated prose, added share button wiring,
canvas upsized to 680x460 to match the book's SVG canvas default.
- site/404.qmd: HUD now surfaces combo and classification tally;
share button wired identically to the standalone page; canvas
upsized to 680x460.
Introduces a new /games/ sub-section — MLSys Playground — designed around
the 'Feel the constraint' thesis: small browser games that turn real ML
systems concepts into 30-second playable loops.
This commit ships the foundation plus the first game, Prune:
- site/assets/games/common.{css,js}: shared palette and runtime used by
every game (best-score persistence via localStorage, canvas
coordinate helpers, aha-card renderer, Box-Muller gauss).
- site/assets/games/registry.js: single source of truth listing all
games. Adding game N+1 is one entry plus flipping an 'available' flag.
- site/assets/games/prune.js: the Prune game — a 5-8-3 neural network
on a canvas where you click low-magnitude weights to remove them
while keeping accuracy above 60 percent. Teaches magnitude-based
pruning and lottery-ticket intuition.
- site/games/prune.qmd: standalone page playable at /games/prune/.
- site/games/index.qmd: gallery landing with Prune live and placeholder
cards for Roofline Runner and OOM (games 2 and 3).
- site/404.qmd: rewritten to dynamically pick a random available game
from the registry and embed it. Today that picks Prune; when
Roofline and OOM land, the 404 rotates across all three.
Design comes from parallel review by lab-designer, author's vision
(Vijay Reddi), Song Han's efficiency-game lens, and Soumith Chintala's
framework-engineer perspective — all four converged on Prune as a
strong first game for fast time-to-aha and genuine mechanical novelty
(grow-by-subtracting has no close analog in arcade gaming).
Quarto's resource-copy step preserves symlinks rather than dereferencing
them, which breaks both local builds (AlreadyExists on the second pass)
and gh-pages deploys (relative symlink targets fall outside _build/).
And Sass resolves @import relative to the importing file's physical
location, not the symlink target. So symlinks inside the resource path
are not a viable dedup mechanism.
Instead, keep real file copies in each consumer subsite and enforce
dedup at edit time with shared/scripts/sync-mirrors.sh:
- bash shared/scripts/sync-mirrors.sh # propagate canonicals
- bash shared/scripts/sync-mirrors.sh --check # CI: fail on drift
Mirror map (source | mirrors):
shared/scripts/subscribe-modal.js -> {site, book/quarto, labs, kits,
mlsysim/docs}/.../subscribe-modal.js
Intentional non-mirrors (left untouched, customized variants):
tinytorch/site-quarto/assets/scripts/subscribe-modal.js (TinyTorch-branded)
tinytorch/site/_static/subscribe-modal.js (legacy Sphinx)
Also dedupe the SocratiQ widget bundle via a symlink (safe here because
book/tools/ sits outside any Quarto project, so the resource walker
never touches it):
book/tools/scripts/socratiQ/bundle.js -> ../../../quarto/tools/scripts/socratiQ/bundle.js
The shared canonical (book/quarto/tools/scripts/socratiQ/bundle.js) is
the version actually referenced and served in production.
- Replace favicon with SEAS shield as navbar logo
- Change collapse-below from xl to lg so hamburger kicks in earlier
- Hide right-side nav text labels between lg-xl breakpoints (icon only)
- Constrain logo to 28px height
Architecture:
- Merge landing, about, community, newsletter into one site/ project
- Move navbar-common.yml to shared/config/ (used by 12 configs)
- Create shared/config/footer-site.yml for centralized footer
- Create shared/scripts/subscribe-modal.js as canonical copy
- Single _quarto.yml replaces 4 independent configs
- One site_libs/ copy replaces four
Features gained:
- Google Analytics on ALL hub pages (was only on book volumes)
- Subscribe modal on landing page (was missing)
- Centralized footer with consistent links
Workflows updated:
- site-preview-dev.yml: matrix strategy → single build job
- site-publish-live.yml: loop over subsites → single build + deploy
- sync-newsletter.yml: builds from unified site project
- publish-all-live.yml: removed stale subsite input
- rewrite-dev-urls.sh: added --shallow flag for unified builds
All 12 navbar-common.yml references updated:
book vol1/vol2, site (unified), slides, instructors, interviews,
kits, labs, mlsysim