D-cleanups folded into one commit:
- CHAIN_ROADMAP.md status header reflects current state (Phase 1+2
complete, Phase 3 pilot landed, Phase 4 mostly shipped).
- Phase 4.1 / 4.6 / 4.7 / 4.9 entries marked complete with commit
refs.
- ARCHITECTURE.md gains a §3.6.1 documenting the two YAML-body
conventions introduced when LLM-authored questions started
landing in Phase 3:
- _authoring private metadata block on drafts (stripped at
promotion)
- gap-bridge:<from>-<to> tag added at promotion for traceability
Neither is schema-enforced (Pydantic accepts extra); both are
stable across the pipeline.
No code changes.
Pull in the dev work that landed since yaml-audit was last synced:
- --legacy-json renamed to --local-json (2b381bb949) — script/doc
updates needed below in this branch
- CI workflow refactor (validate-dev / validate-vault now reusable)
- all-contributors automation, gitignore tightening, codespell list
- PR #1622 navbar URL rewrite for dev preview
- PR #1619 clone-size refactor, #1618 milestone3 xor fix, #1617
perceptron seed, #1616 tito status M3
- Chapter 9 PDF layout refinement
- assorted staffml/practice fixes (pickRandom deps, GitHub star gate)
This merges the canonical dev state into yaml-audit so subsequent
work continues on top of the freshest base. Conflicts in
practice/page.tsx + corpus.ts + ARCHITECTURE.md resolved to keep both
sides' additive changes (Phase 2 tier work + dev's later refactors).
Phase 4.8 of CHAIN_ROADMAP.md.
ARCHITECTURE.md gains a new §3.6 capturing the three deltas that landed
during the chain workstream — additive to v1, not replacements:
- hierarchical question layout (`<track>/<area>/<id>.yaml`)
- sidecar chain architecture (chains.json authoritative; YAML chains:
field retired)
- chain tier model (primary/secondary, default-primary on read)
README.md updates:
- status line: v1.1, points at CHAIN_ROADMAP.md and ARCHITECTURE.md §3.6
- new "Chain build pipeline" section with the diagnose / build /
apply / merge invocations
- layout listing reflects scripts/ and the actual src/ contents
(was stuck on Phase 0 scaffolding shape)
No code changes. The v1 release-pipeline invariants absorb the v1.1
deltas without modification (chains.json is a Merkle leaf; tier flows
into that leaf transparently).
The flag is the StaffML frontend's local-dev fallback (read corpus.json
from disk via NEXT_PUBLIC_VAULT_FALLBACK=static), not a deprecated path.
"Legacy" implied "soon to be removed"; "local-json" describes its actual
role and reads correctly in scripts and docs.
- vault-cli: rename CLI flag, parameter, result key, and help text.
- CI workflows + pre-commit config: invoke the new flag name.
- All scripts that print the command (suggest_exemplars,
pre_commit_corpus_guard, promote_validated, rename_legacy_ids,
export_to_staffml, the paper analyze_corpus/generate_*) updated.
- Comments and docs (ARCHITECTURE, CHANGELOG, REVIEWS, TESTING,
MASSIVE_BUILD_RUNBOOK, DEPRECATED, AUTHORING, plus frontend
comments and .env.example / .gitignore) updated.
The "legacy_json" sentinel string in corpus_stats.json._meta.source
is intentionally NOT renamed — it is a stable artifact format read
by downstream paper-generation tooling.
Update ARCHITECTURE.md to reflect 87 curated topics and 131 edges. Refactor exemplar_coverage_audit.py to use vault.db instead of retired corpus.json. Update exemplar-gaps.yaml inventory.
Brings the last outlier workflow file into the repo-wide
<cluster>-<verb>-<scope>.yml naming convention. Every other cluster
(book, tinytorch, kits, labs, instructors, mlsysim, slides, site,
staffml) uses this pattern; vault-ci.yml was the only one that didn't.
vault-ci.yml → staffml-validate-vault.yml
name: '🎯 StaffML · 🔎 Vault CI' → '🎯 StaffML · ✅ Validate (Vault)'
Now staffml-validate-vault.yml is a direct sibling of
staffml-validate-dev.yml — the former validates the vault data + CLI
+ worker, the latter validates the site build. Same verb, different
scope, easy to reason about.
Updated references:
.github/workflows/staffml-validate-vault.yml — self-reference in
the paths trigger (so the workflow still fires when it's edited)
interviews/vault/ARCHITECTURE.md §19.3 and §51 — both path refs
interviews/vault/TESTING.md §4.1 — workflow name + display name
interviews/vault-cli/scripts/check_registry_append_only.py — docstring
No branch-protection settings change needed — GitHub matches required
checks on the workflow's 'name:' field, not the filename. Anyone with
a bookmark to the old Actions-tab URL will get a 404 (harmless).
Other workflow naming I surveyed but deliberately LEFT alone
(all consistent with existing conventions):
staffml-update-paper.yml matches tinytorch-update-pdfs pattern
staffml-auto-pr.yml matches bot-workflow convention
staffml-welcome.yml single-word verb, standard
auto-label / update-contributors / infra-* / publish-all-live
are cross-cutting (no cluster prefix) by design
Archives pre-v1.0 scripts under scripts/archive/ in both
interviews/vault/ and interviews/vault-cli/. ARCHITECTURE.md §3.3
rewritten with a post-mortem on why path-as-classification could not
represent the paper's full 11-zone × 6-level taxonomy. CHANGELOG.md
added documenting the full v1.0 migration.
Clean up planning, kickoff, audit, and persona-feedback documents
accumulated during prior AI-assisted work sessions. These are
session artifacts, not durable documentation — the decisions they
captured have either shipped, been retired, or are traceable via
git history.
interviews/vault/REVIEWS.md is intentionally kept: it is cited by
section ID (H-6, H-7, H-21, C-6, ...) from production code in
interviews/vault-cli/ and interviews/vault/ and published as the
pyproject.toml Review-Ledger URL, which makes it engineering
documentation rather than a session artifact.
Deletions:
- RELEASE-PREP.md, review_prompt.md (root handoff / review prompts)
- interviews/vault/KICKOFF.md, BOOK_LINKING_PLAN.md, EXPANSION_PLAN.md
- interviews/staffml/FEEDBACK_SYNTHESIS.md, V1_REDESIGN_SPEC.md,
STAFFML_UX_PLAN.md, VAULT_DESIGN_PLAN.md
- interviews/staffml/.gemini-reviews/ (2 review call logs)
- book/docs/SVG_FIGURE_AUDIT_PLAN.md, book/tools/agent_personas.md
- mlsysim/docs/WEBSITE_AUDIT.md
- periodic-table/iteration-log.md, refinement-log.md
Reference fixes for pointers into deleted files:
- interviews/vault/ARCHITECTURE.md: drop section 21 (pointed at KICKOFF.md)
- interviews/vault/schema/question_schema.yaml: drop BOOK_LINKING_PLAN.md
reference in the author-curated resource description
- interviews/staffml/src/components/Footer.tsx: drop BOOK_LINKING_PLAN.md
reference from the docstring; rationale preserved
Also removes the untracked gemini_prompts/ directory at repo root.
Shape change
============
Old: details.deep_dive: {title, url} (singular, optional)
New: details.resources: [{name, url}] (multivalued, optional)
Rationale
=========
The singular deep_dive field paired with a 178-line hostname classifier
(interviews/staffml/src/lib/refs.ts) that labeled each link based on its
host. This model couples question content to a registry of "known hosts"
and forces every question to a single reference. The resources-list
model flips the responsibility: authors write a human-readable name
per reference, the UI renders a plain labeled link, and questions can
cite zero, one, or many references. It also dissolves the deferred
book-linking problem — when book URLs stabilize, authors add a book
entry to whichever questions benefit, with no schema, registry, or
classifier changes required.
Scope (this commit)
===================
- schema/question_schema.yaml: replace DeepDive class with Resource
(name+url), change Details.deep_dive → Details.resources (multivalued)
- schema.py: add Resource pydantic model with https-only + name-length
validators (XSS guard per REVIEWS.md H-6); replace flat
deep_dive_title/deep_dive_url on QuestionDetails with resources list
- vault.py: update field-coverage metric + LLM prompt template
- scripts/generate_hard_questions.py: remove KA_URLS auto-fill
(contradicted the author-curation principle), update prompt template
- scripts/generate_gaps.py: update prompt template + renderer to
iterate resources list
- scripts/build_corpus.py: legacy markdown '📖 Deep Dive:' parser now
appends to resources list instead of setting flat fields
- ARCHITECTURE.md: schema example, SQL DDL, validation rules
- REVIEWS.md: H-6 wording (deep_dive_url → resources[].url)
- corpus.json: scrub 9,495 stale deep_dive_title / deep_dive_url
fields that pre-dated the vault YAML cleanup; add empty resources []
default to all 9,657 questions for shape stability
What this does NOT change
=========================
- Zero question YAMLs are modified. Phase 0 audit confirmed 0 YAMLs
have the deep_dive field populated (see audit script output in the
preceding commit).
- schema_version stays at 1. EVOLUTION.md §2 classifies this as a
breaking-major change that technically warrants schema_version: 2.
However, no data or external consumer depends on the old shape —
the field is uniformly absent in YAML — so the bump is ceremonial.
Deferred until the first breaking change that requires a reader
adapter.
- staffml/src/data/corpus.json (the shipped browser bundle) already
has 0 deep_dive_url fields and 9,199 items; equivalence hash is
unaffected because release_hash is computed from YAML inputs.
- No UI or consumer changes — deep-UI removal and refs.ts shrink
follow as separate atomic commits.
Validation
==========
- All touched Python modules py_compile cleanly
- validate_corpus(corpus.json) against new schema.py: 9247/9657 pass;
the 410 failures are pre-existing 'sustainability-carbon-accounting'
topic taxonomy errors unrelated to this change
- Re-ran audit: still 0 deep_dive fields in YAMLs
Vault-Override: corpus-json-hand-edit: schema-migration artifact scrub removes stale deep_dive_* fields that predate the YAML cleanup and inserts empty resources [] defaults matching the new schema shape. YAML inputs unchanged; release_hash unaffected.
R11 (David, fresh-eyes stability check): 0 Critical + 0 High + 1 Medium
(doc cleanup from R10-F-2 closure itself).
R11-M-1 (MEDIUM): CUTOVER_QA.md + vault-cli/README.md still referenced
--canary-percent flag after R10-F-2 removed it from code + ARCHITECTURE.md.
Operator following CUTOVER_QA.md step 1 of cutover day would hit
'Error: no such option --canary-percent' \u2014 the one document whose
entire purpose is cutover correctness.
Fix: CUTOVER_QA.md \u00a71 replaces canary-staged rollout with all-or-nothing
ship language + Phase-7-deferred note pointing at \u00a74.3. README.md:57
drops [--canary-percent N] from the ship example.
STABILITY DECLARED after R11. Three consecutive rounds (R7, R8, R11) with
zero new Criticals. R11 explicit: 'convergence confirmed.'
Finding-density trajectory across 11 rounds (new Criticals per round):
R1: 3, R2: 1, R3: 2, R4: 3, R5: 3, R6: skipped,
R7: 0, R8: 0, R9: 1* (regression-detect, not new), R10: 0, R11: 0
Total findings closed across all rounds: ~120.
No further rounds scheduled.
ARCHITECTURE.md header bumped v2.5 \u2192 v2.6.
REVIEWS.md adds 'Rounds 7\u201311' section with per-round finding counts,
notable findings, meta-observation on R9 (tooling/persistence issue
Gemini caught that individual-file reviewers couldn't), and the
convergence signal.
R9 Gemini-1M found: my R7 worker-side edits silently failed to persist
(file wasn't touched in R7+R8 commit). Gemini caught the real state of
the worker on disk. Re-applied.
R9-C-1 (CRITICAL \u2014 real, discovered missing): Worker's checkSchemaFingerprint
did NOT filter FTS5 shadow tables. Gemini caught that the filter I
'added' in R7 never actually landed in the commit. Re-applied:
worker's sqlite_master query now has the same AND name NOT IN (...)
exclusion as compiler.py. Mismatch risk: worker in permanent degraded
mode the first time a fresh D1 is queried across SQLite versions.
R9-H-1 (HIGH): ship's d1_forward didn't take R2 snapshot.
Soumith R10-F-1 caught the same issue. Fixed together below.
R9-H-2 (HIGH, real): handleSearch FTS5 probe not memoized.
Module-level ftsProbed added; reset on release_id change.
R9-H-3 (HIGH, real): SLI workflow reads releases/<latest>/vault.db from
disk after checkout, but vault.db is gitignored per M-4. Workflow
would permanently fail on the 'db_path.exists()' check.
Fix: added 'vault build' step to workflow before the Python SLI
script runs. Deterministic; same YAML \u2192 same hashes.
R9-M-1 (MEDIUM, real): schemaOk had no retry on transient D1 failure.
Added schemaCheckedAt + SCHEMA_RECHECK_MS=5min. A single network
blip no longer pins the worker to degraded mode until next release.
R10 Soumith (framework/API lens):
R10-F-1 (HIGH): vault ship bypassed vault deploy's snapshot logic.
Refactored to _do_deploy() shared helper called from BOTH deploy_cmd
and ship_cmd's d1_forward leg. Ship now takes the R2 snapshot
guaranteed; \u00a76.2 'default rollback = snapshot restore \u2014 always
works' contract is now enforced by composition.
R10-F-2 (MEDIUM): --canary-percent flag in spec, not in code.
Removed from ARCHITECTURE.md \u00a74 + marked explicitly 'DEFERRED to
Phase 7' with rationale (CF Workers doesn't expose % traffic-split
for non-enterprise; requires Argo or custom routing). Spec + --help
no longer disagree.
R10-F-3 (LOW): tag-or-skip logic duplicated in tag_cmd + paper_forward.
Extracted _ensure_tag(version) helper; both callers now use it.
Test matrix:
pytest: 38 green in 0.16s
vitest: 7 green in 127ms
ruff: All checks passed
tsc: clean
Convergence tracking:
R1-R8: ~102 findings closed
R9 (Gemini): 1C + 3H + 1M (mostly R7 edits that didn't persist; genuinely 1 new finding R9-H-3 on SLI vault.db)
R10 (Soumith): 0C + 1H + 1M + 1L (F-1 overlapped with R9; genuinely 1 new on F-2 canary spec mismatch)
Honest assessment: still not stable at Round 10. The R9 discovery
that R7 worker edits never persisted is a meta-finding about my own
tooling, not the code design. Post-R10 the worker code is in the
state R7 claimed it was in. One more round should confirm.
ARCHITECTURE.md header bumped v2.4 \u2192 v2.5 marking final pre-deploy state.
REVIEWS.md \u2014 new 'Round 5 Gemini 1M-context holistic review' section:
per-finding table (R5-C-1 through R5-L-1), resolution summary, and
meta-observation on why context-size diversity matters for adversarial
review (Gemini caught cross-file issues that per-file Claude subagents
consistently missed).
interviews/README.md \u2014 question count was stale ('5,700+'); now
'9,000+' matching the post-migration published count of 9,199.
v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed
migration.
WHAT CLOSED (\u00a711.1 contract):
1. `vault build --legacy-json` regenerates the site's
interviews/staffml/src/data/corpus.json from YAML. 9,199 published
questions, site-compatible shape (chain_positions back to 0-indexed
dict form, bloom_level derived from zone, competency_area aliased
from topic, scope aliased from track). Deterministic via sort_keys +
id-sort.
2. Pre-commit hook INSTALLED via worktree-aware Makefile target
(`make -C interviews/vault-cli hooks`). Symlink points at
pre_commit_corpus_guard.py. Tested end-to-end: direct edit to
vault/corpus.json triggers exit-1 with §11.1 reference.
3. CI equivalence check added to .github/workflows/vault-ci.yml:
regenerates corpus.json from YAML, diffs against committed. Fails
PR on drift with actionable error message.
4. Legacy generators demoted with DEPRECATED headers:
- interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper
- interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json
- interviews/staffml/scripts/generate-manifest.py \u2192 vault publish
- interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json
5. New DEPRECATED.md files at interviews/vault/scripts/ and
interviews/staffml/scripts/ map every legacy script to its
replacement. Both directories keep the old scripts for git-history
legibility and archaeology; new contributors see the vault CLI first.
6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead
of aspirational "gone. replaced by..." entries.
NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4):
- test_legacy_shape_matches_site_interface: every field corpus.ts
declares is present in regenerated JSON.
- test_chain_positions_legacy_shape: 1-indexed new schema \u2192
0-indexed legacy dict form.
- test_emitter_deterministic: byte-stable across reversed input order
(required for CI diff-check).
- test_competency_area_aliases_topic: legacy alias fields populated
correctly.
FULL MATRIX GREEN:
pytest: 38/38 passed in 0.19s (34 + 4 legacy-export)
ruff: All checks passed
hook: exit 0 on clean diff / exit 1 on corpus.json direct edit
e2e: vault build --legacy-json regenerates a bit-identical corpus.json
vs the committed one; CI check wired to catch drift
WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9):
- Production serves from D1: requires Phase-3 wrangler d1 create + deploy
- Manual QA per CUTOVER_QA.md: requires live staging
- Zero data loss D1-side verification: requires live D1
- 48h monitoring: requires production traffic
These are intrinsically user-action; the YAML-side migration is done.
ARCHITECTURE.md header bumped to v2.3.
REVIEWS.md: added Round-4 section with 12-item findings table (3C/4H/3M/2L
all resolved), Chip's code-level security audit of the post-Bucket-B
state. Columns: Severity, ID, Finding, Resolution. Verdict: GREEN for
Phase-0/1/2; deploy gates still per CUTOVER_QA.md \u00a70.
B.14 (Soumith R3-F-3): ARCHITECTURE.md \u00a711.5 now documents the
provenance of corpus-equivalence-hash.txt \u2014 it is the release_hash
from vault build against the post-split YAML, not an independent hash
of legacy corpus.json. Clarifies what the CI check proves vs does not,
and points external verifiers at 'vault verify --git-ref' for
citation-grade reproducibility.
B.16: ChainBadge now mounted pre-reveal on practice/page.tsx just
above the question title. Wired to existing chainInfo + showAnswer
state \u2014 hides once user reveals answer so ChainStrip (post-reveal)
takes over. Analytics events chain_badge_shown/clicked fire per
its component contract.
Worker scaffold (mid-flight of B.1/B.3/B.4 \u2014 wiring in next commit):
- src/types.ts: Cursor switched from {offset, filter_hash} to
{after_id, filter_hash}. Server will page WHERE id > after_id
ORDER BY id LIMIT N so cost is O(N) per page, not O(offset+N).
Closes Chip R3-H2 concern about deep-offset cost.
- src/types.ts: Env adds optional RATE_LIMIT_KV + overrides for
per-endpoint-class rate limits.
- src/rate_limit.ts (NEW): KV-backed token-bucket, per-(IP,class)
windowed at 60s. 'default' class \u2014 60 req/min, 'search' \u2014
10 req/min. Open-allows if KV not bound (e.g., local shim).
User concern: preventing commercial reuse of the corpus (e.g., a
vendor training a paid product on the questions, selling access to
them). CC-BY-NC-4.0 permits research citation + non-commercial
derivatives while requiring written permission for commercial use.
interviews/vault/questions/LICENSE (NEW)
CC-BY-NC-4.0 full text with BibTeX template tied to release_hash.
Commercial licensing contact noted.
interviews/vault/ARCHITECTURE.md §15 #1
Marked DECIDED. Rationale recorded. vault-cli license
intentionally left at historical status (not relicensed as part
of this change).
interviews/vault/REVIEWS.md
License state: DECIDED. Removed from Phase-3 blocker list.
interviews/CONTRIBUTING.md
New 'License' section: NC constraint explicit. External corpus
PRs assumed offered under same CC-BY-NC-4.0. Contact for commercial
licensing specified.
Reverts the LICENSE additions from 1bc93374e. I inferred consent
from 'proceed with 2' (which was about Round 3 adversarial review)
and rolled CC-BY-4.0 + MIT into the polish commit. The user never
explicitly approved a license choice; defaulting to status-quo
(no LICENSE file shipped) preserves the original implicit position
and leaves the decision with the user.
ARCHITECTURE.md and REVIEWS.md updated to note the license state
remains OPEN; §15 item 1 'recommendation' status unchanged from v2.0.
ARCHITECTURE.md header bumped to v2.2. Full changelog block added
(v2.1 → v2.2) keyed to Round-3 finding IDs. §7.1 + §10.2 edited to
align X-Vault-Release soft-signal semantics with §6.1.1 (Soumith F-1).
REVIEWS.md §Round-3 added: per-reviewer verdicts (Chip YELLOW, Dean
YELLOW→GREEN, Soumith GREEN-conditional, David YELLOW→GREEN),
convergence map of 11 integrated items, explicitly-deferred list
(Cache API, breaker half-open, rate-limit KV, cross-lang hash path,
worker vitest, LSH dedup — all documented as Phase-3-entry gates).
CONTRIBUTING.md quickstart corrected (David R3-H5): step 3 dropped
the Phase-1+ 'doctor'/'stats' references; step 4 shows 'vault build'
before 'vault api' so the shim has something to serve.
paper/scripts/generate_macros.py rewritten as thin wrapper over
'vault export-paper' (B.1 — closes §20.5 #2 + #7). Uses
sys.executable -m vault_cli.main so PATH isn't required.
paper/macros.tex (regenerated): 66-line emission with both
\staffml* and legacy \num* namespaces. paper.tex needs no edits
during transition. Paper and site now agree by construction —
the structural fix for H-21 (9,199 vs 8,053) bug class.
paper/corpus_stats.json (regenerated): full superset of the v1
analyze_corpus.py output, driven by SQL over vault.db with
'by_zone', 'by_level', 'by_track', chain 'by_length' distribution,
'bloom_distribution' (zone→bloom derived mapping), applicability.
Extends ARCHITECTURE.md with three new sections and a standalone kickoff
file so a fresh Claude Code session can pick this up cleanly:
§18 Review & iteration protocol — 2–3 rounds of adversarial expert
review (chip-huyen, jeff-dean, soumith-chintala, student-david)
with a severity-ranked findings table committed to REVIEWS.md
after each round.
§19 Testing plan skeleton — 10-layer test inventory (unit / integration
/ contract / migration / export-parity / worker-contract / e2e /
smoke / load / rollback), CI workflow spec, cutover QA checklist,
observability protocol. Full spec lives in a future TESTING.md
written during Stage 2.
§20 Autonomous mode — explicit gate between "plan hardened" and
"execute." Pre-autonomous checklist, per-phase commit/push/
checkpoint rules, stop conditions, 9-point success definition.
§21 Pointer to KICKOFF.md.
KICKOFF.md is the copy-paste prompt for starting the next session.
Self-contained, tells the operator: read context, run Stage 1 review,
write Stage 2 testing plan, WAIT for user green-light before Stage 4
autonomous execution. Scoped directories, commit style, stop conditions.
No implementation work begins until the plan clears review and the user
explicitly green-lights. Intentionally slow at the start to be fast
and safe later.
17-section design document covering the full migration from monolithic
corpus.json (19 MB inlined per page bundle) to:
- Per-question YAML files as the authoring source of truth
- SQLite (vault.db) as the built artifact
- Cloudflare D1 + Worker as production distribution
- Typer + Rich CLI (`vault new`, `vault build`, `vault publish`, ...)
- Single release gate that enforces consistency between site + paper
Supersedes the stale SYSTEM.md (which still references 5,786 questions
and 839 concepts from a year-old state). Captures three bugs found in
the current pipeline (paper/site filter disagreement, orphan topics,
no release enforcement) and the fix for each.
Includes sections on: chain discoverability UX, About-page paper
prominence, workflow continuity during migration, LLM-assisted
generation via `vault generate`, security/safety/rollback, and a
6-phase rollout plan totaling ~11 working days.
Does not yet implement any of this. Phase 0 scheduled for after
the 2026-04-22 MIT Press copyedit deadline.