merge_audit_runs.py — merges multiple per-track audit_corpus_batched
output dirs into one canonical run. Per-qid prefer non-error rows,
then rows with suggested_corrections.
AUDIT_FINDINGS_2026-05-03.md — first complete corpus audit.
summarize_audit.py — truncate rationale snippets at word boundaries
(was truncating mid-word, tripping codespell on words like 'claimin').
Phase 4 final stats (9,446 published questions audited):
format_compliance: ~960 fail
level_fit: ~1,580 fail
coherence: ~480 fail
math_correct: ~330 fail
title_quality: ~250 placeholder + ~25 malformed
20 error rows in global to retry on next run
1,767 questions have suggested_corrections; ~1,500 more need a
propose-fixes backfill pass (mostly cloud, some edge).
CORPUS_HARDENING_PLAN.md Phase 4 finalization.