2 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
d2621cc9ed feat(vault-cli): merge_audit_runs.py + Phase 4 findings doc
merge_audit_runs.py — merges multiple per-track audit_corpus_batched
output dirs into one canonical run. Per-qid prefer non-error rows,
then rows with suggested_corrections.

AUDIT_FINDINGS_2026-05-03.md — first complete corpus audit.

summarize_audit.py — truncate rationale snippets at word boundaries
(was truncating mid-word, tripping codespell on words like 'claimin').

Phase 4 final stats (9,446 published questions audited):
  format_compliance:   ~960 fail
  level_fit:          ~1,580 fail
  coherence:            ~480 fail
  math_correct:         ~330 fail
  title_quality:        ~250 placeholder + ~25 malformed
  20 error rows in global to retry on next run

1,767 questions have suggested_corrections; ~1,500 more need a
propose-fixes backfill pass (mostly cloud, some edge).

CORPUS_HARDENING_PLAN.md Phase 4 finalization.
2026-05-03 14:26:37 -04:00
Vijay Janapa Reddi
3eaac3ca93 feat(vault-cli): summarize_audit.py — Phase 4 finalization helper
Reads a 01_audit.json and emits a markdown triage doc with:
  - executive summary (per-gate pass/fail/error counts)
  - per-track failure rate matrix
  - coherence failure-mode breakdown (vendor fabrication / physical
    absurdity / mismatch / arithmetic)
  - priority lists for human review (math errors highest, then
    coherence by mode, then level inflation, then placeholder titles)
  - regex-vs-Gemini format-disagreement audit
  - recommended next-step actions per category

Usage:
  python3 interviews/vault-cli/scripts/summarize_audit.py \\
      --input <01_audit.json> \\
      --output interviews/vault-cli/docs/AUDIT_FINDINGS_<date>.md

Smoke-tested on the in-flight Phase 4 audit (2,880 cloud rows so far).
Sample findings on the partial data:
  - 95 math errors caught (real ones — e.g. "200 * 4 * 168 = 134,400,
    not $168k/week")
  - 385 level_fit fails (13.4% — slightly higher than global's 15.3%)
  - 146 coherence fails: 61 mismatch / 47 arithmetic / 37 physical
    absurdity / 1 vendor fabrication
  - 118 placeholder titles in cloud (Gemini flags more than just
    the "Global New NNNN" pattern; Phase 7 scope is bigger than the
    plan estimated)

CORPUS_HARDENING_PLAN.md Phase 4.
2026-05-03 11:06:26 -04:00