The flag is the StaffML frontend's local-dev fallback (read corpus.json
from disk via NEXT_PUBLIC_VAULT_FALLBACK=static), not a deprecated path.
"Legacy" implied "soon to be removed"; "local-json" describes its actual
role and reads correctly in scripts and docs.
- vault-cli: rename CLI flag, parameter, result key, and help text.
- CI workflows + pre-commit config: invoke the new flag name.
- All scripts that print the command (suggest_exemplars,
pre_commit_corpus_guard, promote_validated, rename_legacy_ids,
export_to_staffml, the paper analyze_corpus/generate_*) updated.
- Comments and docs (ARCHITECTURE, CHANGELOG, REVIEWS, TESTING,
MASSIVE_BUILD_RUNBOOK, DEPRECATED, AUTHORING, plus frontend
comments and .env.example / .gitignore) updated.
The "legacy_json" sentinel string in corpus_stats.json._meta.source
is intentionally NOT renamed — it is a stable artifact format read
by downstream paper-generation tooling.
Combined revision pass against (a) a 10-item correctness audit of paper.tex
versus the consolidated 0.1.0 release at 55fec89898 and (b) three fresh-reader
persona reads (ML systems engineer, NeurIPS D&B reviewer, working
practitioner). The two passes converged on the same five high-leverage
issues; this commit addresses all of them plus the audit's must-fix list.
Structural rewrites (§1–§3):
- Abstract leads with the artifact (~9,757 questions, real hardware) and
acknowledges the construct-validity gap on page 1.
- "Ikigai" demoted from competency-model brand to a one-line mention of where
the four-circle Venn visual is borrowed from; body uses "four-skill model"
and "cognitive zones" throughout.
- §1 lead-restructured: TinyML duty-cycling example moved up to follow the
"constraints drive architecture" thesis (so reader sees a concrete question
before the framework apparatus). Punchline corrected — at 2% duty cycle
the active term dominates sleep current; constraint is duty cycle.
- §3.2 adds the upfront 11→6 zones honesty so the §13 admission isn't a
surprise reveal.
- §3.3 reframes ZONE_LEVEL_AFFINITY as a soft authoring prior; flags
ZONE_BLOOM_AFFINITY as the hard validate-at-write rule.
- "Five Laws" softened from operative axis to organising pillars (paper
doesn't actually tag questions with laws).
Audit correctness fixes:
- Quantify→implement naming: kept "quantify" in prose, added a clarifying
footnote about the schema identifier and macros mapping.
- L867 zone counts now use \zoneDiagnosisCount / \zoneFluencyCount /
\zoneEvaluationCount / \zonesBelowFloor macros (already auto-emitted by
the consolidated 0.1.0 release).
- L892 "27%" arithmetic + 79-vs-87 topic discrepancy: documented the
pre-v1 origin of the matrix in a footnote; reported counts characterise
the original 316-cell matrix; v1-topic per-track applicability now an
explicit limitation in §13.
- Table 3 (areas) regenerated for 87 topics across 13 areas.
- Table 4 (full taxonomy) extended with the 8 v1 topics in their
appropriate area columns.
- \numedges semantics clarified: 57 means prerequisite edges only;
raw counts for broader/narrower (14) and related (54) given inline.
- 31→32 root topics (matches corpus_stats.json.taxonomy_graph.root_topics).
- L815 math-verification: replaced bi-model story with what actually
shipped — three-stage Gemini-3.1-pro-preview pipeline (generator → judge
→ math reviewer) with cross-stage agreement; bi-model framed as future.
- Bloom→industry-ladder mapping (Table 5) softened to "illustrative,
not normative" with explicit deferral to ongoing psychometric calibration.
- "Physics-grounded" exclusion-matrix language softened to
"deployment-feasibility" where appropriate; "physics-grounded" preserved
for napkin math (where it is genuinely physics).
- §6→§7 LinkML claim made honest: schema is canonical, derived artefacts
kept in sync via tools/check_schema_sync.py drift check.
- §7 schema/infrastructure: documents the SQLite vault.db build and
Cloudflare D1 worker production path (corpus.json relegated to fallback).
Insertion paragraphs (audit Rewrites A/B/C/D):
- Rewrite A: §QA Schema Validation bullets now document the four v0.1.0
model-validators (Visual.kind enum, path regex, alt/caption length,
zone-bloom compatibility, visual-path-resolves) plus the 15 unit tests.
- Rewrite B: §LLM-Assisted Generation gains validate-at-write contract
paragraph + PARALLELISM_RULES variant + cumulative yield numbers
(462 area fixes, 576 zone-bloom fixes, 1308→0 lint warnings).
- Rewrite C: §QA Structural and Semantic Invariants gains repair-script
paragraph naming all five scripts and the bounded fix discipline.
- Rewrite D: §LLM Failure Modes replaced with empirical 5-mode taxonomy
observed during release-readiness audits.
Reader-flagged fixes:
- MFU=40% diagnosis question lowered to a more plausible 8% with batch-
size context; arithmetic-intensity ceiling math made explicit.
- MCQ format constraint context added to §7 opening so the L768
"MCQ must have 4 options" rule isn't a surprise.
- §13 Limitations expanded from six to seven; new entry covers the
v1-topic applicability-matrix lag.
New figure:
- figures/fig-practice-ui.svg: a single-page mockup of the practice
interface (filter sidebar, active question card, chain progression
rail). Inserted as Figure 11 in §11 Practical Applications. Addresses
the convergent reader complaint that the paper never showed what a
study session looks like.
Build verification:
- 35 pages, three pdflatex passes, zero overfull hboxes from new content.
- All numbers consistent with macros.tex emitted by vault export-paper
at the consolidated 0.1.0 release (793c06f414f2bf83).
Pre-existing undefined-citation warnings (williams2009roofline,
nvidia2022h100, etc.) are not from this revision; they were already
present in the bibliography.
Small bbl-validation helper for the interviews paper bibliography. Reads
paper.bbl, extracts each bibitem's rough title, queries CrossRef, and
prints [OK] / [WARN] / [ERR] per citation key. Useful as a spot-check
after large bibliography edits to catch typos, wrong years, or silently-
renamed works.
Placed alongside the other paper-tooling (analyze_corpus.py,
generate_figures.py, generate_macros.py). Path resolution uses
Path(__file__).parent so it works from any CWD.
ARCHITECTURE.md header bumped to v2.2. Full changelog block added
(v2.1 → v2.2) keyed to Round-3 finding IDs. §7.1 + §10.2 edited to
align X-Vault-Release soft-signal semantics with §6.1.1 (Soumith F-1).
REVIEWS.md §Round-3 added: per-reviewer verdicts (Chip YELLOW, Dean
YELLOW→GREEN, Soumith GREEN-conditional, David YELLOW→GREEN),
convergence map of 11 integrated items, explicitly-deferred list
(Cache API, breaker half-open, rate-limit KV, cross-lang hash path,
worker vitest, LSH dedup — all documented as Phase-3-entry gates).
CONTRIBUTING.md quickstart corrected (David R3-H5): step 3 dropped
the Phase-1+ 'doctor'/'stats' references; step 4 shows 'vault build'
before 'vault api' so the shim has something to serve.
paper/scripts/generate_macros.py rewritten as thin wrapper over
'vault export-paper' (B.1 — closes §20.5 #2 + #7). Uses
sys.executable -m vault_cli.main so PATH isn't required.
paper/macros.tex (regenerated): 66-line emission with both
\staffml* and legacy \num* namespaces. paper.tex needs no edits
during transition. Paper and site now agree by construction —
the structural fix for H-21 (9,199 vs 8,053) bug class.
paper/corpus_stats.json (regenerated): full superset of the v1
analyze_corpus.py output, driven by SQL over vault.db with
'by_zone', 'by_level', 'by_track', chain 'by_length' distribution,
'bloom_distribution' (zone→bloom derived mapping), applicability.
Consistent layout for StaffML, mlsysim, and TinyTorch papers:
- figures/ for all visual assets (SVGs, PDFs, PNGs)
- scripts/ for utility scripts (analysis, validation, benchmarks)
- tables/ for standalone table .tex files (StaffML only)
- Makefile at root for building (created one for mlsysim)
Removed redundant build scripts (compile_paper.sh, build.sh) in
favor of Makefiles. Deleted sort_app_matrix.py (no longer needed).
Merged mlsysim images/ into figures/. Updated all references in
paper.tex, Makefiles, and CI workflows.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>