The flag is the StaffML frontend's local-dev fallback (read corpus.json
from disk via NEXT_PUBLIC_VAULT_FALLBACK=static), not a deprecated path.
"Legacy" implied "soon to be removed"; "local-json" describes its actual
role and reads correctly in scripts and docs.
- vault-cli: rename CLI flag, parameter, result key, and help text.
- CI workflows + pre-commit config: invoke the new flag name.
- All scripts that print the command (suggest_exemplars,
pre_commit_corpus_guard, promote_validated, rename_legacy_ids,
export_to_staffml, the paper analyze_corpus/generate_*) updated.
- Comments and docs (ARCHITECTURE, CHANGELOG, REVIEWS, TESTING,
MASSIVE_BUILD_RUNBOOK, DEPRECATED, AUTHORING, plus frontend
comments and .env.example / .gitignore) updated.
The "legacy_json" sentinel string in corpus_stats.json._meta.source
is intentionally NOT renamed — it is a stable artifact format read
by downstream paper-generation tooling.
Closes the cleanup arc (A.1–A.10 in RESUME_PLAN_RELEASE.md). Every
gate is now green: vault check --strict, vault lint, vault doctor,
vault codegen --check, staffml validate-vault, Playwright (9/9), tsc.
A.1 mobile-1962.svg: renamed `Edge` → `RegEdge` in graphviz source
(`Edge` is a reserved keyword); SVG renders cleanly. Also fixed
tinyml-1570.py (missing `import numpy as np`) which the new failure
log surfaced.
A.2 render_visuals.py: structured per-ID failure log written to
`_validation_results/render_failures.json` on every run; non-zero
exit on any per-item crash; new `--fail-fast` and `--failure-log`
CLI options. Replaces the prior silent-failure mode.
A.3 LinkML visual schema: typed as a structured sub-schema. New
`VisualKind` enum (svg only — `mermaid` was reserved but never
shipped, dropped to keep the enum honest). Path regex tightened
to `^[a-z0-9-]+\.svg$`. Alt minimum length 10, caption required
minimum length 5. TypeScript Visual interface + Question.visual
field added to staffml-vault-types/index.ts.
A.4 Pydantic Visual + Question validators:
- Visual.kind hard-rejects anything but `svg`
- Visual.path enforces the new regex
- Visual.alt min 10 chars, caption required min 5 chars
- Question.model_validator: visual.path MUST resolve to a real
file under interviews/vault/visuals/<track>/. Skipped in
production deploys where the working tree is absent.
A.5 Registry repair + doctor split:
- tools: repair_registry.py appended 5,269 missing IDs
(the rename refactor at 8a5c3ff3c left the append-only registry
unsynced; this brings disk-coverage to 100%). Header block in
id-registry.yaml documents the rebuild rationale.
- doctor.py: split symmetric `registry-integrity` check into
`disk-coverage` (HARD FAIL if any disk YAML id is unregistered)
and `registry-history` (INFO ONLY for retired ids — the registry
is by design an audit log, retired ids are normal). Pre-existing
`_check_schema_version` bug (`versions == {1}` vs string `"1.0"`)
fixed.
A.6 Lint calibration via 4-expert consensus + bloom-canonical
reclassification:
- Spawned 4 experts (Vijay Reddi, Chip Huyen, Jeff Dean,
education-reviewer) on 42 disputed (zone, level) pairs;
consensus-builder aggregated to 15 valid / 19 invalid / 8
borderline.
- User arbitrated 8 borderlines: 7 widen / 1 reclassify.
- Built ZONE_BLOOM_AFFINITY matrix (Education-Reviewer's idea):
every zone admits its dominant Bloom verb + adjacent verbs,
rejects clear hierarchy violations.
- reclassify_zone_bloom_mismatch.py applied 576 deterministic
zone fixes via BLOOM_CANONICAL_ZONE mapping (e.g. fluency+analyze
→ analyze, recall+analyze → analyze, evaluation+apply → implement).
- Question.model_validator(_zone_bloom_compatible): hard-rejects
future zone-bloom mismatches at write time. Generated drafts
can no longer ship a self-contradicting classification.
- ZONE_LEVEL_AFFINITY widened per consensus + arbitration +
post-reclassification adjustments. Lint warnings: 1,308 → 0.
A.7 Chain integrity:
- repair_chains.py: drops chain refs when a chain has <2 published
members (chain ceases to exist), renumbers all members of any
chain whose positions are non-sequential / duplicated /
non-monotonic-by-level. Sort key: level ascending, then old
position, then qid (deterministic).
- validate-vault.py: relaxed sequential check to unique-positions
check. Position gaps from mid-chain deletions are normal; what
matters is uniqueness + bloom-monotonicity (vault check --strict
enforces both from YAML source-of-truth).
A.8 Practice page visual + zoom modal:
- QuestionVisual.tsx: wraps the `<img>` in `<Zoom>` from
react-medium-image-zoom (4 KB). Click image → fullscreen
`<dialog data-rmiz-modal>`; ESC closes. Added test-id
`question-visual-img` for stable selector.
- New Playwright test: 9th in the suite, deep-links cloud-4492,
asserts the dialog opens on click and closes on ESC.
- TypeScript: removed `mermaid` from local Visual types in
corpus.ts and corpus-vault.ts; tsc clean.
A.9 All gates green:
- vault check --strict: 0 errors / 0 invariant failures
- vault lint: 0 errors / 0 warnings (was 1,308 warnings)
- vault codegen --check: artifacts in sync (hash baseline updated)
- vault doctor: 0 fails (registry-history info, git-state warn
on uncommitted state-pre-this-commit)
- staffml validate-vault: 0 errors / 0 warnings, deployment-ready
- Playwright: 9/9 pass (was 8; +zoom modal test)
- render_visuals: 0 errors (was 2 silent failures pre-A.2)
- tsc: clean
Distribution after reclassification: 9,544 published unchanged;
576 items moved zone via bloom-canonical mapping (full per-item
report at /tmp/reclassify_changes.csv). Chain count 879 → 850
after orphan-singleton drops. release_hash updated.
Carry-forward to next session (Phase B):
- Priority gap closure for parallelism cells + global L4-L6+
(the run that produced this corpus did not close the targeted
cells; B.3 needs specialized prompts per cell-class)
- 120 NEEDS_FIX items from coverage_loop/20260425_150712/ still
carry judge fix_suggestions; spawn fix-agent in Phase C
Two new pieces close the generation→validation→saturation feedback loop:
1. gemini_cli_llm_judge.py — multi-criteria validator. For each draft,
judges math correctness, cell-fit (does it actually target the
declared track/zone/level?), scenario realism, uniqueness vs canonical
questions, and visual-asset alignment. Returns PASS/NEEDS_FIX/DROP
per item. Batched (default 15 per call) for budget efficiency.
2. iterate_coverage_loop.py — drives the full loop:
analyze → plan → generate → render → judge → apply → re-analyze.
Self-paced: stops when (a) top priority gap drops below threshold,
(b) DROP rate exceeds the saturation/hallucination threshold,
(c) total API calls exceed budget, or (d) the same cell is top
priority for two iterations in a row (convergence). The user no
longer specifies "how many questions" — the loop generates until
the corpus reaches a measurable steady state.
Plus 25 round-1 visual questions generated by the new batched generator
(5 batched calls × 5 cells each, zero failures).
The loop is the answer to "we need balance, not just volume": every
iteration's plan derives from a fresh analysis of where coverage is
weakest, so generation can never over-fill an already-saturated cell.
Schema fix: visual.kind is always 'svg' (the format the website ships) and
visual.path points to that asset. The build-pipeline format is recorded as
optional metadata in visual.source_format ('dot' | 'matplotlib' | 'hand'),
which the website ignores. This separates "what users render" from "how
maintainers built it".
Source files live next to the SVG by naming convention; the renderer infers
the path from the YAML's source_format hint without a dedicated source field.
Five new visual exemplars generated by Gemini 3.1 Pro Preview, covering
diverse archetypes:
- cloud-2849 (DOT): incast-bottleneck topology
- cloud-2850 (DOT): leaf-spine fabric with 2:1 oversubscription
- cloud-2851 (matplotlib): bandwidth bar chart for data pipeline diagnosis
- cloud-2852 (matplotlib): checkpoint/recovery timeline with RPO/RTO
- edge-0972 (matplotlib): Poisson vs bursty queueing curves
Plus the four prior exemplars (cloud-2846, 2847, 2848, tinyml-0816)
re-emitted under the new schema. cloud-visual-001 unchanged — already had
the correct shape.
ARCHITECTURE.md rewritten to document the simpler three-layer separation
(website / build / authoring).
gemini_cli_generate_questions.py mirrors gemini_cli_math_review.py's design:
review-first, JSON-strict, model pinned to gemini-3.1-pro-preview with a hard
guard against override. Targets weak coverage cells from the portfolio
balance loop or explicit --target track:topic:zone:level cells.
For visual-eligible topics (the 10 archetypes in audit_visual_questions.py),
the generator also produces the diagram source artifact (DOT or matplotlib
script) which render_visuals.py converts to a ship-ready SVG. This closes
the generation→render→validate loop using two different model passes:
Gemini drafts; the math review verifies.
First generated example: tinyml-0816 (wake-word duty-cycle evaluation) with
a matplotlib power-timeline visual. Math review returned CORRECT on the
first call. Status remains draft pending broader cross-validation.
ARCHITECTURE.md establishes that visuals are a property of any question, not
a separate category. Three supported formats let the layout engine do the
work: DOT for graph topology, matplotlib for curves and Gantt charts, hand
SVG for custom layouts.
render_visuals.py is the single entry point that dispatches by visual.kind,
runs the appropriate tool, and normalizes the rendered SVG to the book's
font stack. It is idempotent and supports --dry-run.
Three exemplars cover the three formats:
- cloud-2846 (DOT): Tree AllReduce on 8 ranks — auto-laid-out topology
- cloud-2847 (matplotlib): Queueing hockey-stick curve with SLO line
- cloud-2848 (matplotlib): Pipeline-bubble Gantt for GPipe schedule
All three are status:draft pending math review and promotion in a later
batch. Existing cloud-visual-001 remains unchanged as the canonical
hand-SVG exemplar.
Seeds the visuals/ directory with a reference pattern so future
authors have a concrete template to clone.
Exemplar: Ring AllReduce on 4 ranks (cloud track, L3, apply/analyze).
- SVG follows .claude/rules/svg-style.md: 680×460 viewBox, Helvetica
Neue, compute-blue ranks, orthogonal ring arrows, 10-px grid.
- YAML wires the visual block (kind=svg, path=cloud-visual-001.svg,
alt + caption) and pairs it with a matching question: 'Using the
diagram, calculate the total time to complete the full AllReduce.'
- The realistic_solution walks through 2(N−1)/N × data / bw and
explains the common failure mode (forgetting the all-gather phase).
Napkin math shows the step-time decomposition.
AUTHORING.md: the when/how/why guide for future visual questions.
- When a visual earns its place — three criteria (ask requires the
diagram, encodes info text cannot, static suffices).
- High-value candidate topics — ring/tree AllReduce, roofline, KV
cache, pipeline bubbles, memory hierarchy, MCU memory maps,
systolic arrays, attention, MoE.
- Step-by-step authoring workflow pointing at the book's SVG style
guide for the visual system — readers already know the visual
vocabulary from the book, so consistency transfers.
- Accessibility requirements (non-negotiable): alt is enforced by
the Pydantic schema, colour never the sole semantic channel, text
in <text> elements not paths, WCAG AA contrast.
- Explicit anti-patterns: no inline SVG in YAML, no mermaid for
non-graph content, no decorative effects, no label duplication of
scenario prose.