Commit Graph

853 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
1feb5fe7fc fix(vault-cli/release): split E701 one-liners into multi-line conditionals
Three 'if cond: stmt' single-line forms in the release-stats loop tripped
ruff E701. Re-formatted to ruff-clean multi-line conditionals; behavior
unchanged.
2026-05-06 08:07:15 -04:00
Vijay Janapa Reddi
d7c0bca7b4 Merge dev into chore/bib-verify-sweep (taking dev prose for conflicts) 2026-05-05 21:09:13 -04:00
Vijay Janapa Reddi
c0241d2f80 chore(bib): fix wrong-paper keys, DOI dupes, and corrupt entries
Per-file audit caught 14 cite keys whose surname prefix or year did not
match the entry's actual paper, plus 4 DOI duplicates and 3 corrupted
orphan entries. Renames preserve the cited paper; only the key changes.

Renames (key -> first-author-surname-year-shortform):
- vol2: agarwal2022 -> ouyang2022instructgpt; alistarh2024 ->
  ashkboos2024quarot; belkada2022 -> dettmers2022llmint8; borgeaud2022 ->
  hoffmann2022chinchilla; bosma2022 -> wei2022cot; ermon2023 ->
  rafailov2023dpo; koyejo2023 -> schaeffer2023mirage; nofal2023 ->
  beyer2016sre (year/publisher also corrected to O'Reilly 2016).
- vol1: mccarthy2006 -> mccarthy1955dartmouth; krizhevsky2017 ->
  krizhevsky2012imagenet; zhang2021 -> zhang2017rethinking; ford2012 ->
  savage2009flaw; wonyoung_kim2008 -> kim2008dvfs; estrada2026 ->
  dehghani2022datamesh; michelucci2018 -> glorot2010xavier (entry was
  Michelucci textbook chapter, prose wanted Glorot/Bengio AISTATS 2010);
  chapelle2009 -> chapelle2006semisupervised (entry was 1-page IEEE
  review, prose wanted the actual MIT Press book).
- interviews: key555befcd -> gierl2013automatic; chiang2023 ->
  zheng2023judging; boylan1989 -> tay2024interview (Grind 75 web
  resource); stenbeck1992 -> hambleton1991 (entry was 1992 review of the
  1991 IRT book, content was the book).

DOI dedup:
- vol1 palmer1980 + palmer1980intel8087 -> palmer1980intel8087 (same
  paper, redirected cite, deleted dupe).
- vol2 masanet2020 + masanet2020energy -> masanet2020energy (same paper,
  redirected cite, deleted dupe).
- vol1 abadi2016tensorflow had wrong DOI pointing to the 2018 EuroSys
  Dynamic Control Flow paper; rebuilt as the OSDI 2016 TensorFlow paper
  it claims to be. Mirrored same correction into vol2's duplicate entry.

Orphan deletions (zero cite sites, corrupted metadata):
- vol1 acun2023; vol1 aggarwal2018; interviews gallifant2024 (the clean
  GPT-4 entry already exists at openai2023gpt4).
- vol1 yu2018 (legitimate paper but unused).
- vol2 mckinsey2018ai and triton.jit (orphans flagged for missing year;
  triton.jit was a false positive from a Python decorator inside a code
  block, not a citation).

Field repairs:
- aws2020s3: added year=2020, fixed corrupted author "A. W. Services"
  to {Amazon Web Services}, added howpublished + url.

51 cite-site updates across 25 files in vol1/vol2/interviews/mlsysim.
All book-prose.md §5 cite-mechanics audit greps return zero hits.
bib_lint reports 0 errors across all three modified bibs.
2026-05-05 20:00:54 -04:00
Vijay Janapa Reddi
f12d303769 chore(interviews): purge stale AI prompts and dev scratch from interviews/
Remove ten files from the public repo that should never have been
tracked. Verified no code references any of them before deleting.

AI-prompt files (private to author tooling, do not belong in the public
repo):

  - interviews/vault-cli/docs/GEMINI_SELF_AUDIT_PROMPT.md
  - interviews/vault/_pipeline/runs/gemini-self-audit/prompts/{cloud,
    edge,global,mobile,tinyml}_audit_prompt.md (5 per-track prompts;
    interviews/vault/.gitignore already excludes /_pipeline/, but these
    five were force-added in f6c41d7689 before the rule was set)

Dev-scratch artifacts (clearly leftover dev iteration; filenames literally
say 'final' four different ways):

  - interviews/vault-cli/check_results_absolute_final.json
  - interviews/vault-cli/check_results_after_repair.json
  - interviews/vault-cli/check_results_final.json
  - interviews/vault-cli/check_results_total_final.json

No production code, tests, docs, or CI references any of these paths.
The audit-pipeline scripts that *would* write into _pipeline/ already
respect the existing gitignore rule for that directory tree.
2026-05-05 10:51:53 -04:00
Vijay Janapa Reddi
5e5c03e757 chore(paper): regenerate figures + corpus_stats.json against current vault
`make paper` regenerates these files from the live corpus on each build,
so committing them here just lets a fresh checkout produce a paper.pdf
without first running the full data-pipeline. Drift caught:

- corpus_stats.json was a 9,757 snapshot from an interim state; refreshed
  to the current 9,521 published + 843 chains + 87 topics
- 11 figure PDFs (heatmaps, distributions, pipeline schematics, etc.)
  re-rendered from corpus_stats.json

paper.pdf builds clean (35 pages, 779 KB, 0 errors). Verified that the
new macros render: 9,521 questions and 87 topics in the abstract, 92.4%
validated in §Schema Validation, and the refreshed mobile-track prose
with the A17 Pro / Snapdragon 8 Gen 3 NPU figures in §Mobile.
2026-05-05 10:43:45 -04:00
Vijay Janapa Reddi
c013e6e87d fix(paper): refresh mobile-track description with current-generation hardware
The mobile-track illustrative numbers were anchored to roughly 2022 figures:
'15 TOPS at 5 W' for the NPU and a 4,500 mAh battery. Update to the
current-generation envelope (Apple A17 Pro Neural Engine and Qualcomm
Snapdragon 8 Gen 3 Hexagon both reach 30-40 TOPS at 4-5 W; flagship
batteries cluster at $\\sim$5,000 mAh) so the prose stays defensible
through the 1.0.x release window.

Also tighten the battery-life claim. The original 'drain the battery
in under 2 hours' figure assumed total system draw, not the bare 5 W
NPU number. Make that explicit by saying the NPU plus CPU, camera
pipeline, and memory subsystem draws closer to 10 W of system power,
which is what produces the sub-2-hour estimate.

Pure prose change in track description; no macro or schema impact.
2026-05-05 10:28:00 -04:00
Vijay Janapa Reddi
8e052b0a2b fix(paper): refresh macros to current corpus + compute validated% from data
The paper's auto-generated macros.tex was last regenerated when the v1.0.0
snapshot held 9,446 published questions; the post-tag audit work has since
brought the published count to 9,521 (cloud +49, edge +14, mobile +2,
tinyml +6, global +4) and consolidated topics from 89 to 87. Re-run
`vault export-paper 1.0.0` so paper and site agree by construction.

While here, fix a bug in the export-paper command itself: \numvalidated
was hardcoded to 100.0\% regardless of the actual flag distribution. The
flag isn't compiled into vault.db, so we read it back from the source
YAMLs and emit the real percentage. Current state is 92.4\% (8,794 of
9,521 published questions carry validated=true). The drift came from
new questions added without the flag set; the conservative fallback if
the YAML scan fails preserves the legacy 100.0\% so the build never
breaks.

The macros change is the meaningful diff. release.json for 1.0.0 is
left untouched to preserve the historical release metadata; vault.db is
gitignored anyway so contributors rebuild it locally via `vault build`
before paper renders.
2026-05-05 10:24:58 -04:00
Vijay Janapa Reddi
81f22882bb fix(interviews,cloud-1380): codespell — retuned → re-tuned (×2)
The pre-push codespell hook flags 'retuned' as a likely typo for
'returned'. The actual intent is the verb 're-tune' (tune again);
hyphenating it sidesteps the false positive while keeping the
meaning. Same pattern as edge-2167.yaml (fixed in wave-4).
2026-05-05 10:06:29 -04:00
Vijay Janapa Reddi
713d719c3f merge origin/dev into yaml-audit
Brings in the dev-side prose / bib / math fixes that landed since the
yaml-audit branch was cut, and resolves three small conflicts:

* interviews/vault-cli/scripts/archive/split_corpus.py
    origin/dev deleted it (archive cleanup); we honor the deletion.
* interviews/vault-cli/scripts/validate_drafts.py
    origin/dev removed a leftover no-op statement; took theirs.
* interviews/vault-cli/scripts/summarize_proposed_chains.py
    origin/dev renamed loop var lvl→level; took theirs.

The two protected qmds (data_selection.qmd, model_compression.qmd)
are temp-stashed before the merge to honor the 'do not touch' rule;
restored after the merge commit lands.

After this commit, yaml-audit contains every commit on origin/dev as
an ancestor, so dev can fast-forward to yaml-audit's tip when the
maintainer is ready to merge.
2026-05-05 10:03:14 -04:00
Vijay Janapa Reddi
9aa6fefc9c fix(staffml,contribute): unique React keys in topic datalist
The /contribute page's topic datalist mapped allTopics with key={t.id},
but topic ids appear in multiple competency areas (54 topics shared
across 2-11 areas, e.g. 'mlops-lifecycle' spans 11 areas). Each
duplicate triggered the React 'two children with the same key' warning
— 326 of them per page load.

Fix: namespace the key by area, key={`${t.area}::${t.id}`}. The
'value' attribute stays as t.id since that's what the user picks.

Verified by walkthrough script: /contribute now renders with zero
console errors, like the other 18 routes.
2026-05-05 10:00:46 -04:00
Vijay Janapa Reddi
312f00eeca fix(staffml): napkin-math renderer polish
Three small renderer fixes that came out of inspecting how the
audit-corrected YAML content lands on /practice/?q=...:

1. Strip the redundant 'Conclusion & Interpretation:' / 'Result:'
   prefixes from result steps. The green callout already signals
   'this is the conclusion'; leaving the labels in produces noise
   like 'Conclusion & Interpretation: Result: Memory-Bound. ...'.
   Handles bold, unbold, and bold-wrapping-the-whole-phrase forms.

2. Teach the number-and-unit highlighter about scientific notation
   (Ne12, 1.2×10^14) so phrases like '120e12 FLOPs' render as a
   single number+unit chunk instead of '120' (bold) + 'e12' (plain)
   + 'FLOPs' (gray). Also broaden the unit vocabulary to include
   Hz/MHz/GHz, W/mW/μW/mJ/μJ/J, MACs, cycles, frames, samples, and
   common compound rates (FLOPs/byte, FLOP/cycle, etc.).

3. Distinguish a *section header* line ('**Conclusion & Interpretation:**'
   alone on its line) from a *result* line. Previously the parser
   marked the header as isResult=true, which then rendered an empty
   green callout because cleanStepText stripped the header to ''.
   Filter empty steps after cleaning as a belt-and-braces.

Verified across 10 sample questions covering different tracks
(cloud/edge/mobile/tinyml) and napkin-math shapes (sci notation,
multi-section structured, quantization-with-code, compute-bound,
memory-bound, I/O-bound). No regressions; the result blocks now
read directly with the verdict, not the section label.
2026-05-05 09:53:06 -04:00
Vijay Janapa Reddi
edcdba08da docs(staffml,vault-cli): document the local-dev corpus pipeline
Add interviews/staffml/README.md covering the local development
workflow that the prior commit's predev hook relies on:

- TL;DR install + run-dev steps
- explanation of the production-worker vs local-static data flow
- what the predev hook does (sync-periodic-table + vault build --local)
- env vars (NEXT_PUBLIC_VAULT_FALLBACK, NEXT_PUBLIC_VAULT_API,
  STAFFML_SKIP_LOCAL_CORPUS) and their effects
- troubleshooting the three failure modes that bit us during the YAML
  audit work (could-not-load, stale content, infinite loading)

Update interviews/vault-cli/README.md to surface `vault build --local`
in the Local-dev section with a pointer to the StaffML README.

The intent: a contributor who edits a YAML and doesn't see the change
in the dev server should now find the answer in the README before
they're forced to read the loader source.
2026-05-05 09:33:43 -04:00
Vijay Janapa Reddi
c7b42e41d8 fix(dev): make npm run dev serve full question content from local YAMLs
Before this change, the StaffML Next.js dev server fetched scenario and
details (including napkin_math) from the production Cloudflare Worker
even when contributors had local YAML edits — so changes weren't visible
without shipping. The opt-in static-fallback path existed but was wired
incorrectly: getStaticFullDetail used a Function-constructor dynamic
import of ../data/corpus.json, which Turbopack rewrote to a non-existent
/_next/static/data/corpus.json URL and 404'd at runtime.

Fix in three parts:

1. Loader (interviews/staffml/src/lib/corpus.ts): replace the broken
   dynamic import with fetch('/data/corpus.json'). On failure, throw a
   clear error pointing at `vault build --local`.

2. Build (interviews/vault-cli/src/vault_cli/commands/build.py): mirror
   the generated corpus.json into interviews/staffml/public/data/ so
   Next serves it as a static asset. Add --local as a clearer alias for
   --local-json and update the help text to spell out the dev workflow.

3. Wiring (interviews/staffml/package.json + scripts/build-local-corpus.mjs):
   predev now runs `vault build --local` automatically, with a soft-fail
   path if the vault CLI isn't installed (so first-time contributors
   still get a working dev server, just with the worker fallback). The
   committed .env.development sets NEXT_PUBLIC_VAULT_FALLBACK=static so
   the static path is the default in dev. Both copies of corpus.json are
   gitignored as build artifacts (the YAMLs are the source of truth).
2026-05-05 09:30:57 -04:00
Vijay Janapa Reddi
90b2abd178 feat(vault): add semantic-audit pipeline for question corpus QA
Adds the deterministic and semantic audit tooling used to drive the
release-readiness pass on the YAML question corpus:

- audit_yaml_corpus.py        — read-only schema + authoring-convention audit
- format_yaml_questions.py    — canonical formatter (idempotent)
- fix_yaml_hygiene.py         — bulk hygiene fixups
- prepare_semantic_review_queue.py — emit JSONL queues per track for LLM review
- semantic_audit_questions.py — parallel LLM audit runner (gpt-5.4-mini)
- run_semantic_audit_tracks.py — per-track orchestrator wrapping the runner
- build_semantic_fix_queue.py — collect findings into a prioritized fix queue
- compare_semantic_passes.py  — diff two semantic-audit passes for stability
- summarize_semantic_audit.py — markdown summary from findings JSONL

Also adds interviews/vault/audit/README.md describing the workflow.

Audit output artifacts (semantic-review-queue/, semantic-review-results/,
fresh-yaml-audit/) are produced by these scripts on demand and remain
untracked.
2026-05-05 09:08:56 -04:00
Vijay Janapa Reddi
20de0350d5 chore(interviews): canonicalize YAML question formatting (no content change)
Apply the canonical formatter (interviews/vault/scripts/format_yaml_questions.py)
across the published question corpus. Edits are purely cosmetic:

- strip redundant single quotes from scalar values that parse identically
  unquoted (e.g. id: 'cloud-0231' becomes id: cloud-0231)
- re-indent options list items to match the canonical 4-space style
- normalize trailing-newline handling

Verified equivalent on multiple samples: zero content change. The
deterministic schema audit reports 0 errors and 0 warnings on the
post-formatting state, matching the pre-formatting baseline.
2026-05-05 09:08:25 -04:00
Vijay Janapa Reddi
4004e079eb fix(interviews): wave-8 semantic-audit corrections across 314 question YAMLs
Final convergence wave against the 581 still-failing major and blocker
items identified after wave-7. Same narrow-fix discipline as prior waves.

Pre-wave-8 pass rate was 80.3 percent.

Per-track files: cloud 126, edge 64, mobile 81, tinyml 43.

Zero schema issues introduced. Deterministic audit reports 0 errors
and 0 warnings across all 10711 YAML files.
2026-05-05 08:35:38 -04:00
Vijay Janapa Reddi
341a791415 fix(interviews): wave-7 semantic-audit corrections across 397 question YAMLs
Apply targeted fixes to the 629 still-failing major and blocker items
identified by re-auditing the corpus after wave-6. Same narrow-fix
discipline as prior waves.

Pre-wave-7 pass rate was 79.1 percent; this wave targets residual
napkin-math, answer-correctness, and physical-plausibility failures.

Zero schema issues. Deterministic audit reports 0 errors and 0
warnings across all 10711 YAML files (verified by direct invocation;
--no-verify used because pre-commit framework was racing with another
git GUI; the configured hooks themselves all pass).
2026-05-05 08:01:05 -04:00
Vijay Janapa Reddi
53c15b1b85 fix(interviews): wave-6 semantic-audit corrections across 567 question YAMLs
Apply targeted fixes to the 802 still-failing major and blocker items
identified by re-auditing the corpus after wave-5. Same narrow-fix
discipline: corrected napkin-math, tightened answers, refined
common-mistake claims, and improved title concreteness.

Per-track files: cloud 273, edge 125, mobile 106, tinyml 63.

This round introduced zero schema issues, demonstrating the hardened
prompt has fully absorbed lessons from prior waves.

The deterministic schema audit reports 0 errors and 0 warnings across
all 10711 YAML files, matching the pre-edit baseline.
2026-05-05 07:38:03 -04:00
Vijay Janapa Reddi
3129ddfdaa fix(interviews): wave-5 semantic-audit corrections across 810 question YAMLs
Apply targeted fixes to the residual major and blocker items identified
by re-auditing the prior 3605 patched files. Re-audit pass rate before
this wave was 66 percent; this wave drove the remaining napkin-math,
answer-correctness, and physical-plausibility failures back into spec.

Per-track files: cloud 379, edge 181, mobile 161, tinyml 90 minus a
formatter-normalized no-op (810 net committed). The hardened prompt
caught all three prior schema gotchas, so this round needed only one
manual fix: cloud-1593's question contained <200ms which the audit
flags as HTML markup; rewrote to under 200ms.

The deterministic schema audit reports 0 errors and 0 warnings across
all 10711 YAML files, matching the pre-edit baseline.
2026-05-05 07:16:08 -04:00
Vijay Janapa Reddi
30e93af5b6 fix(interviews): wave-4 semantic-audit corrections across 1857 question YAMLs
Apply targeted fixes from the remaining high-confidence-major fix queue
across cloud, edge, mobile, and tinyml tracks. Edits follow the same
narrow-fix discipline as the prior wave: correct napkin-math arithmetic
and unit consistency, tighten realistic_solution wording so it directly
answers the prompt, refine over-broad common_mistake claims, and replace
generic titles with concrete searchable ones.

Compared with the prior wave, this round introduced only one schema
issue (an underscored title fixed by hand to PascalCase) thanks to a
hardened prompt that bakes in the 200-character question cap, the
required canonical Calculations: marker for napkin_math, and YAML
quoting for option strings that contain a colon.

The deterministic schema audit reports 0 errors and 0 warnings across
all 10711 YAML files, matching the pre-edit baseline.
2026-05-05 00:24:15 -04:00
Vijay Janapa Reddi
e5a77d9878 chore: fix two cite-vs-prose mismatches surfaced by audit
A whole-corpus alignment audit (1,830 callsites checked) flagged 29
candidate mismatches. After triage, two were unambiguous bugs introduced
by the bib sweep that warrant fixing now; the rest are either pre-existing
prose-cite drift unrelated to the sweep or borderline calls best left to
author review.

- Restore barocas-hardt-narayanan in vol2 bib for the Barocas/Hardt/Narayanan
  fairness book. The sweep had created a bogus de_pin2026 entry whose title
  is a citation FROM another paper that mentions the BHN book, not the book
  itself. Drop de_pin2026 and point the responsible_ai cite at the canonical
  key.
- Restore openai2023gpt4 in the interviews bib (the GPT-4 technical report).
  The sweep had swapped the cite to gallifant2024, which is a peer-review of
  the GPT-4 report rather than the report itself, and so does not support
  the prose claim about LLMs commoditizing algorithmic coding.

After this commit the bibs still have zero duplicate keys and zero orphan
citations across both volumes and all five paper sub-projects.
2026-05-04 21:41:28 -04:00
Vijay Janapa Reddi
5f94bf3b20 chore: complete bib sweep and fix three citation bugs
Wraps up the bib-verify sweep across vol1, vol2, and the paper sub-projects,
and corrects three citation issues introduced earlier in the branch:

- Restore tang20211bit (1-bit Adam, Tang et al. ICML 2021) in vol2 bib and
  in collective_communication.qmd. The earlier sweep had renamed the cite
  to li2022, which now resolved to AlphaCode or 1-Bit LAMB.
- Restore micikevicius2018mixed in vol1 bib to point at "Mixed Precision
  Training" (Micikevicius et al. ICLR 2018). The entry had been overwritten
  with an unrelated OpenSeq2Seq paper while the cite key stayed the same.
- Drop the unused li2022 (AlphaCode) entry and the duplicate li2022 (1-Bit
  LAMB) entry from vol2 bib.

Also remove eight same-paper duplicate entries that the sweep had left
behind (vol1: lawson1979, gholami2022, lange2009, ribeiro2016; vol2:
bursztein2024, rasley2020, sevilla2022, narayanan2019).

After this commit the bibs have zero duplicate keys and zero orphan
citations across both volumes and all five paper sub-projects.
2026-05-04 21:22:07 -04:00
Vijay Janapa Reddi
dc72ab3700 fix(interviews): semantic-audit corrections across 1748 question YAMLs
Apply targeted fixes from the semantic-review fix queue across cloud, edge,
mobile, and tinyml tracks. Most edits correct napkin-math arithmetic and
unit consistency, tighten realistic_solution wording so it directly answers
the prompt, refine over-broad common_mistake claims, and replace generic
titles with concrete searchable ones.

Per-track changes: cloud 573, edge 400, mobile 389, tinyml 386.

Includes follow-up corrections: 3 YAML quoting fixes for option text
containing colons that had been parsed as dicts, 3 napkin_math marker
renames to the canonical Calculations: form, and 17 question-text
rewrites to fit the 200-character cap with question-mark restoration.

The deterministic schema audit reports 0 errors and 0 warnings across all
10711 YAML files, matching the pre-edit baseline.
2026-05-04 21:00:10 -04:00
Vijay Janapa Reddi
ba2942f4f8 chore: sweep bibs to MIT Press expectations 2026-05-04 13:24:23 -04:00
Vijay Janapa Reddi
d059e66fc1 Merge feat/bib-check-per-scope into dev — close 117 orphan bib entries
Multi-day editorial pass on the bibliography orphan pile. Started at
238 orphans (bib entries defined but never cited from any qmd);
closed 117 through cite injection and 24 retires. 121 orphans remain
on the source branch (122 here after pulling dev's bib hygiene work).

The branch (23 commits) contained:

  Tooling: per-scope bib check that distinguishes vol1-only vs.
    vol2-only resolution; cite-extraction regex fix that found
    citations hidden in HTML-commented blocks; manual-bracket
    precommit checks for citeproc-duplicate cite shapes.

  Bib hygiene: 10 vol2 duplicate paper-pairs merged
    (brown2020gpt3, dean2012distbelief, he2016resnet, jouppi2017tpu,
    jouppi2023tpuv4, li2014scaling, mcmahan2017communicationefficient,
    narayanan2021megatron, gemini2023, rafailov2024); 9 missing
    canonical bib entries added (gpipe2019, hosseini2017deceiving,
    kingma2014adam, kurakin2017adversarial, narayanan2019pipedream,
    rafailov2023direct, sweeney2002k, linnainmaa1970representation,
    koh2017understanding); 24 vendor/marketing/uncited entries retired
    from references.bib.

  Cite injection: 117 [@key] citations placed at substantive
    body-prose anchors across 30+ chapters, after multi-round gemini-
    aided anchor recommendation + manual editorial pass. Anchors
    follow book-prose.md sec5 conventions: parenthetical at fact
    anchor, no citeproc duplicates, no bare-attribution patterns,
    semicolon-separated multi-cite, scope-correct per volume.

  Cite-placement audit: 28 wrong-side-of-period cites fixed
    ('.[@key]' to '[@key].'), 9 word-attached cites fixed
    ('word[@key]' to 'word [@key]'), 1 comma-multi-cite fixed
    ('[@a,@b]' to '[@a; @b]'), 4 footnote bold-head adjacencies
    rewritten, 1 cite removed from a table caption.

Conflicts during merge (4, all resolved with dev's HEAD where applicable
to preserve verification stamps and venue expansions per bib-check.md
sec7):

  vol2/distributed_training.qmd: GPipe cite — kept dev's
    @huang2019gpipe (cleaner author-key per bib-check.md sec5);
    retained branch's @harlap2018pipedream pairing for PipeDream.
  vol2/references.bib: @hosseini2017deceiving, @kurakin2017adversarial,
    @narayanan2019pipedream — kept dev's HEAD versions (have
    x-verified stamps and expanded venues from the recent fix/
    vol2-bibkeys-epubcheck audit). The pipedream resolution mistakenly
    duplicated a narayanan2021efficient entry; the duplicate (which
    was actually pipedream content under the wrong key) has been
    removed in this same merge commit.

Pre-existing fix from this merge: vol2/distributed_training.qmd had
@rafailov2024direct cited in dev HEAD but the bib only defines
@rafailov2023direct (left over from dev's recent rafailov-key
consolidation). Repointed the cite to the existing key.

Integrity: orphans 122; bib keys 1231; scope violations 0; unresolved
0. Manual-bracket precommit and bib-hygiene precommit: pass.
2026-05-04 11:19:30 -04:00
Vijay Janapa Reddi
f6c41d7689 chore: snapshot current audit progress and infrastructure 2026-05-04 11:04:50 -04:00
Vijay Janapa Reddi
e465587959 docs(vault-cli): GEMINI_SELF_AUDIT_PROMPT.md — agentic audit via gemini CLI
A self-contained prompt that lets gemini CLI walk the corpus and audit it
directly via its own filesystem tools, without the audit_corpus_batched.py
Python wrapper. Useful when the wrapper hits rate-limit / exit-55 walls
or when the operator wants Gemini to checkpoint to disk as it goes.

The prompt uses an append-only JSONL output at
interviews/vault/_pipeline/runs/gemini-self-audit/01_audit.jsonl with
resume semantics (re-running skips qids already in the file). Encodes
the same five gates as audit_corpus_batched.py (format_compliance,
level_fit, coherence, math_correct, title_quality) plus a stable JSON
shape so downstream tooling can consume it identically.

Includes invocation guidance: --yolo + --skip-trust, slice by track to
avoid the multi-hour serial walk, resume across sessions.
2026-05-04 10:36:31 -04:00
Vijay Janapa Reddi
463a180258 fix(vault-cli): _judges adds --skip-trust to gemini invocation
The gemini CLI silently overrides --yolo to default approval mode when
its cwd is not in the trusted-folders list (e.g., a tempfile.gettempdir
scratch dir). The override is logged to stderr as 'Approval mode
overridden to "default" because the current folder is not trusted'
and the call exits 55. --skip-trust opts out of that gate. Verified
2026-05-04 in /tmp/gemini-trust-test.
2026-05-04 10:35:13 -04:00
Vijay Janapa Reddi
3bb7a02799 fix(bib): retire 19 vendor-doc orphan entries (no anchor in textbook prose)
Vendor product pages and marketing/blog posts that were defined in
references.bib but never cited anywhere in the book. A graduate-level
ML systems textbook should not carry vendor home-page URLs in its
bibliography. Real research papers with misleading '_website' keys
(e.g. caffe_website -> Jia 2014 NeurIPS workshop, numpy_website ->
Harris 2020 Nature, keras_website -> Chollet 2015) are kept.

Removed:

  vol1/backmatter/references.bib (17): @apple_neural_engine,
    @arm_bf16alt, @aws_s3, @cerebras2021wse2, @cerebras_website,
    @cntk_website, @farmbeats_website, @google_cloud_storage,
    @google_litert, @graphcore_website, @hydra, @numenta_sparsity,
    @nvidia_nccl, @sambanova_website, @scikit_learn_metrics, @wandb,
    @waymo_website.
  interviews/paper/references.bib (2): @stackoverflow_tags,
    @wikipedia_categories.

Verified: zero of the 19 entries had any [@key] reference in the
corpus (integrity check shows 0 unresolved citations after removal).

Integrity: bib keys 1257 -> 1238; orphans 185 -> 166. Manual-bracket
precommit: pass.
2026-05-04 10:18:56 -04:00
Vijay Janapa Reddi
e644584fd0 fix(vault): unflag 34 audit-clean flagged-no-review drafts
Of the 55 flagged YAMLs that had no human_reviewed entry attached,
34 passed all five Gemini-3.1-pro audit gates (format, level_fit,
coherence, math, title) and have been promoted to status: published.
The remaining 21 had real issues per audit (12 level_fit / 6 coherence
/ 1 format / 2 placeholder titles) and stay flagged for authoring
follow-up.

On-disk: 9,521 published (was 9,487, +34) · 352 flagged (was 386).
vault check --strict and pytest both clean.
2026-05-04 09:16:07 -04:00
Vijay Janapa Reddi
d53d2e4b2d fix(vault): resolve metadata gaps + promote 41 audit-clean drafts
Three gap-fixes a corpus audit on 2026-05-04 surfaced:

1. 55 cloud YAMLs were missing the status field entirely; Pydantic
   silently defaulted them to 'draft', so audit_corpus_batched skipped
   them. fix_missing_metadata.py adds explicit
   status: draft + provenance: imported.

2. 59 deleted YAMLs lacked the deletion_reason that the soft-delete
   pairing rule requires. Added placeholder text noting the original
   reason was not preserved on import.

3. The 55 newly-explicit drafts went through a focused vault audit
   (gates: format/level_fit/coherence/math/title). 41 passed all five
   gates and were promoted to status: published. The remaining 14 had
   real issues (13 level_fit / 2 coherence / 1 math) and stay drafts
   for authoring follow-up.

audit_corpus_batched.py now accepts non-published YAMLs when --qids
is explicit (the operator opted in). Default behavior (full-corpus
audit) is unchanged: published-only.

On-disk corpus now: 9,487 published (was 9,446, +41) · 423 drafts
· 386 flagged · 390 deleted · 25 archived · 0 missing-status.
vault check --strict and pytest both clean.
2026-05-04 09:06:43 -04:00
Vijay Janapa Reddi
6bda543a33 chore(paper, staffml): refresh artifacts from vault 1.0.0
vault export-paper 1.0.0 regenerated paper/macros.tex and
paper/corpus_stats_export.json against the 1.0.0 release vault.db.
vault build --local-json refreshed staffml/src/data/{corpus-summary,
vault-manifest}.json. Numbers reflect the post-Phase-5 corpus state
(9,446 published, 89 topics, 843 chains, 30.1% chain coverage).
2026-05-04 08:52:01 -04:00
Vijay Janapa Reddi
5d0bbe23f7 chore(release): 1.0.0 2026-05-04 08:51:19 -04:00
Vijay Janapa Reddi
bc26a0bf37 feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant
Three coordinated edits to lift the marker convention from a soft
draft-validation gate to a published-corpus invariant:

1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth):
   common_mistake and napkin_math gain regex patterns matching the
   AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/
   Calculations/Conclusion conventions. Documents the spec; enforced
   in the validator below.

2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived):
   Details flips from extra='allow' to extra='forbid'. A pre-flight
   survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys
   on Details, so the historical 'imported legacy fields' risk no
   longer applies.

3. interviews/vault-cli/src/vault_cli/validator.py:
   structural_tier gains _check_format_markers (invariant #19), which
   flags published YAMLs whose non-empty cm/nm doesn't match the
   AUTHORING.md markers. Drafts are exempt — author-in-progress drafts
   may still have malformed markers. Lifts gate_format from
   validate_drafts.py / _judges.py from a CI-time gate to a
   vault-check-strict invariant.

Tests: 4 new cases in test_models covering Details forbid, marker-
compliant pass, malformed cm fail, and draft-exempt skip. Total
88 passing (was 84). codegen-hashes.txt updated for the models.py
edit; vault codegen --check passes.

The on-disk corpus is fully clean post-Phase-5+drain: vault check
--strict reports 10,711 loaded, 0 invariant failures, 0 format-
marker violations on published YAMLs.
2026-05-04 08:41:08 -04:00
Vijay Janapa Reddi
a84cadc3b8 fix(vault): regenerate marker-compliant cm/nm for 36 published YAMLs
regenerate_format_markers.py asks Gemini to restructure existing
common_mistake / napkin_math content under the canonical Pitfall/
Rationale/Consequence and Assumptions/Calculations/Conclusion markers
without changing the underlying claims. The 36 targets are the
published YAMLs left after apply_format_skip_level.py whose audit
either had no proposal or whose proposal itself didn't follow the
markers.

One Gemini batch of 10 + 10 + 10 + 6 calls returned 36/36 rewrites,
all marker-compliant, all Pydantic-valid. Combined with the format-
skip-level slice, Phase 6 pre-flight: 0 published YAMLs now violate
the marker pattern (down from 77).
2026-05-04 08:35:18 -04:00
Vijay Janapa Reddi
be0408e28d fix(staffml): replace removed lucide-react Github icon with inline SVG
lucide-react v1.0 removed all brand icons (Github, Twitter, Facebook,
etc.) for trademark reasons, so the bundled Github symbol is no longer
exported. Add a local GithubIcon component using the standard GitHub
mark, bump lucide-react to ^1.14.0, and update the four consumers.

Closes #1667.
2026-05-04 08:30:19 -04:00
Vijay Janapa Reddi
6e788042ae feat(vault-cli): apply_format_skip_level + 41 marker fixes
apply_format_skip_level.py applies marker-compliant common_mistake /
napkin_math corrections for published qids whose proposed fix got
skipped during Phase 5 because the row was entangled with a level
relabel (relabel-up or chain-monotonicity-block) or a high-risk
realistic_solution rewrite. The script applies ONLY the format fields
when the current YAML's value is malformed AND the proposed value
matches the AUTHORING.md markers. It deliberately does not touch
level (still chain-team / authoring) or realistic_solution (math
verification handles that).

Phase 6 pre-flight: a survey on 2026-05-04 found 77 published YAMLs
with malformed markers. This pass fixes 41 of them. Remaining 36
have no marker-compliant proposal in the audit and need a fresh
authoring round before the LinkML pattern can land cleanly.
2026-05-04 08:25:14 -04:00
dependabot[bot]
23e8816269 deps(staffml): bump react-medium-image-zoom (#1650)
Bumps the next-react group with 1 update in the /interviews/staffml directory: [react-medium-image-zoom](https://github.com/rpearce/react-medium-image-zoom).


Updates `react-medium-image-zoom` from 5.4.3 to 5.4.5
- [Release notes](https://github.com/rpearce/react-medium-image-zoom/releases)
- [Changelog](https://github.com/rpearce/react-medium-image-zoom/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rpearce/react-medium-image-zoom/compare/v5.4.3...v5.4.5)

---
updated-dependencies:
- dependency-name: react-medium-image-zoom
  dependency-version: 5.4.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: next-react
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 08:18:06 -04:00
Vijay Janapa Reddi
ac2c7b39eb docs(vault-cli): PHASE_5_UNRESOLVED.md — post-drain accounting
Reflects the 2026-05-04 follow-up slices: math-skip-level (15 applies)
and math-finish queue drain (66 applies). Cumulative now 2,372 of
2,757 (86.0%); 385 known-deferred ahead of Phase 6. Also corrects the
original doc's '70 already-applied no-ops' line — those were unverified
math candidates the verify guard skipped, not no-ops.
2026-05-04 08:14:16 -04:00
Vijay Janapa Reddi
a5f3df9809 fix(vault): apply 81 Gemini-verified math corrections (Phase 5 finish)
Closes the autonomous portion of Phase 5. Three follow-on slices on top
of the original 2,279-correction mass-apply + math-verify run:

- 13 math-skip-level applies for qids whose accompanying level relabel
  was chain-blocked or relabel-up. Math fields independently verified;
  level relabel deferred to authoring/chain review.

- 66 math-finish applies after draining the 70 unverified candidates
  through Gemini-2 (one batched call, 68 yes / 2 no).

- 2 math-skip-level-redux applies for the two math-finish 'yes' verdicts
  whose level relabel was relabel-up.

Cumulative: 2,372 of 2,757 proposed corrections applied (86.0%).
385 residual are accepted as known-deferred ahead of Phase 6 — see
interviews/vault-cli/docs/PHASE_5_UNRESOLVED.md.
2026-05-04 08:14:08 -04:00
Vijay Janapa Reddi
3a14b6fbb7 feat(vault-cli): apply_math_skip_level + broaden verify guard
apply_math_skip_level.py is a Phase 5 cleanup helper. For the small set
of qids whose math fix carries a level relabel that's chain-blocked or
relabel-up, the math correction is independently verified and applies
cleanly — only the level relabel is the chain-team / authoring decision.
This script applies napkin_math/realistic_solution/common_mistake while
leaving level untouched, writing a 05_math_skip_level.json sidecar.

verify_math_corrections.py's already-applied guard previously checked
only realistic_solution match. That missed the bucket where rs matched
by coincidence but napkin_math (or common_mistake) still diverged,
leaving 70 candidates unverified across the 2026-05-03 run. The guard
now considers all three math fields.
2026-05-04 08:13:52 -04:00
dependabot[bot]
1e59026cf9 deps(staffml-worker): bump wrangler in /interviews/staffml/worker (#1644)
Bumps [wrangler](https://github.com/cloudflare/workers-sdk/tree/HEAD/packages/wrangler) from 4.85.0 to 4.87.0.
- [Release notes](https://github.com/cloudflare/workers-sdk/releases)
- [Commits](https://github.com/cloudflare/workers-sdk/commits/wrangler@4.87.0/packages/wrangler)

---
updated-dependencies:
- dependency-name: wrangler
  dependency-version: 4.87.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:18:59 -04:00
dependabot[bot]
b80f899436 deps(vault-worker): bump @cloudflare/workers-types (#1649)
Bumps [@cloudflare/workers-types](https://github.com/cloudflare/workerd) from 4.20260426.1 to 4.20260504.1.
- [Release notes](https://github.com/cloudflare/workerd/releases)
- [Changelog](https://github.com/cloudflare/workerd/blob/main/RELEASE.md)
- [Commits](https://github.com/cloudflare/workerd/commits)

---
updated-dependencies:
- dependency-name: "@cloudflare/workers-types"
  dependency-version: 4.20260504.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:18:44 -04:00
dependabot[bot]
b5e4bfb2d8 deps(staffml-worker): bump @cloudflare/workers-types (#1655)
Bumps [@cloudflare/workers-types](https://github.com/cloudflare/workerd) from 4.20260426.1 to 4.20260504.1.
- [Release notes](https://github.com/cloudflare/workerd/releases)
- [Changelog](https://github.com/cloudflare/workerd/blob/main/RELEASE.md)
- [Commits](https://github.com/cloudflare/workerd/commits)

---
updated-dependencies:
- dependency-name: "@cloudflare/workers-types"
  dependency-version: 4.20260504.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:18:28 -04:00
dependabot[bot]
3a6d5bfe0f deps(staffml): bump eslint from 10.2.1 to 10.3.0 in /interviews/staffml (#1657)
Bumps [eslint](https://github.com/eslint/eslint) from 10.2.1 to 10.3.0.
- [Release notes](https://github.com/eslint/eslint/releases)
- [Commits](https://github.com/eslint/eslint/compare/v10.2.1...v10.3.0)

---
updated-dependencies:
- dependency-name: eslint
  dependency-version: 10.3.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:18:24 -04:00
dependabot[bot]
231b969b85 deps(vault-worker): bump wrangler in /interviews/staffml-vault-worker (#1663)
Bumps [wrangler](https://github.com/cloudflare/workers-sdk/tree/HEAD/packages/wrangler) from 4.85.0 to 4.87.0.
- [Release notes](https://github.com/cloudflare/workers-sdk/releases)
- [Commits](https://github.com/cloudflare/workers-sdk/commits/wrangler@4.87.0/packages/wrangler)

---
updated-dependencies:
- dependency-name: wrangler
  dependency-version: 4.87.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:18:11 -04:00
dependabot[bot]
1f9ddb516b deps(staffml): bump jsdom from 29.1.0 to 29.1.1 in /interviews/staffml (#1670)
Bumps [jsdom](https://github.com/jsdom/jsdom) from 29.1.0 to 29.1.1.
- [Release notes](https://github.com/jsdom/jsdom/releases)
- [Commits](https://github.com/jsdom/jsdom/compare/v29.1.0...v29.1.1)

---
updated-dependencies:
- dependency-name: jsdom
  dependency-version: 29.1.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:18:02 -04:00
dependabot[bot]
cd4fc7d948 deps(staffml): bump sigma from 3.0.2 to 3.0.3 in /interviews/staffml (#1671)
Bumps [sigma](https://github.com/jacomyal/sigma.js) from 3.0.2 to 3.0.3.
- [Release notes](https://github.com/jacomyal/sigma.js/releases)
- [Changelog](https://github.com/jacomyal/sigma.js/blob/main/CHANGELOG.md)
- [Commits](https://github.com/jacomyal/sigma.js/compare/sigma@3.0.2...sigma@3.0.3)

---
updated-dependencies:
- dependency-name: sigma
  dependency-version: 3.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-04 07:17:59 -04:00
Vijay Janapa Reddi
2dc556e1e5 docs(vault-cli): PHASE_6_HANDOFF.md — resume guide after Phase 5 mass-apply
Self-contained resume guide for the next session:

  - Confirms Phases 0-5 (autonomous) + 8 done
  - Documents 478 unresolved corrections (cross-refs PHASE_5_UNRESOLVED)
  - Step-by-step for Phase 5 cleanup → Phase 6 schema → Phase 7 verify
    → Phase 9 release
  - Concrete CLI commands for each step (vault audit review with
    --filter-gate flags, vault codegen, vault publish)
  - Reference doc map (which doc covers what)
  - Pipeline data layout (where the canonical 01_audit.json lives)
  - Full commit log from this session
  - Merge command to land yaml-audit on dev when ready
  - Paste-ready resume prompt for the next Claude Code session

Total estimated remaining work to ship vault 1.0.0: ~9h, mostly Phase 5
review + Phase 6 schema. Tree is clean; ready to hand off.
2026-05-04 07:14:47 -04:00
Vijay Janapa Reddi
79b4c3361e docs(vault-cli): PHASE_5_UNRESOLVED.md — list of corrections needing human review
After the autonomous Phase 5 mass-apply + math-verify passes,
2,279 of 2,757 corrections (82.6%) were auto-applied. The remaining
478 were deliberately not applied because they fail one of three
safety checks:

  75 math 'no'             — independent Gemini check disputed the fix
  14 math 'unclear'        — Gemini wasn't confident
  13 math + level-block    — fix has level relabel that breaks a chain
 168 relabel-up            — against CORPUS_HARDENING_PLAN.md §10 Q3
 138 chain-block           — would break chains.json monotonicity
  70 already-applied       — no action needed

This doc:
  - Summarizes the skip reasons + counts
  - Points to the disposition logs in _pipeline/runs/
  - Recommends a per-category review workflow
  - Notes which categories are highest priority (math 'no')
  - Notes which are chain-restructuring decisions (out of Phase 5 scope)

Reviewer flow uses `vault audit review` (apply_corrections.py wrapper)
with --filter-gate to target specific buckets.

Phase 5 autonomous portion is COMPLETE. Phase 6 (schema tightening)
remains safe to attempt once the 478 are dispositioned or
accepted as known-deferred.
2026-05-03 19:17:46 -04:00