mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-06 17:49:07 -05:00

Files

Vijay Janapa Reddi 2dc556e1e5 docs(vault-cli): PHASE_6_HANDOFF.md — resume guide after Phase 5 mass-apply

Self-contained resume guide for the next session:

  - Confirms Phases 0-5 (autonomous) + 8 done
  - Documents 478 unresolved corrections (cross-refs PHASE_5_UNRESOLVED)
  - Step-by-step for Phase 5 cleanup → Phase 6 schema → Phase 7 verify
    → Phase 9 release
  - Concrete CLI commands for each step (vault audit review with
    --filter-gate flags, vault codegen, vault publish)
  - Reference doc map (which doc covers what)
  - Pipeline data layout (where the canonical 01_audit.json lives)
  - Full commit log from this session
  - Merge command to land yaml-audit on dev when ready
  - Paste-ready resume prompt for the next Claude Code session

Total estimated remaining work to ship vault 1.0.0: ~9h, mostly Phase 5
review + Phase 6 schema. Tree is clean; ready to hand off.

2026-05-04 07:14:47 -04:00

12 KiB

Raw Permalink Blame History

Phase 6+ handoff — resume guide for the next session

Status as of 2026-05-04: Phases 0-5 (autonomous portion) complete. 2,279 of 2,757 proposed corrections applied + validated. 478 outliers documented in PHASE_5_UNRESOLVED.md for human review. Tree clean, validators green.

Branch: yaml-audit (106 commits ahead of origin/dev, 0 behind) Worktree: /Users/VJ/GitHub/MLSysBook-yaml-audit Active workplan: interviews/vault-cli/docs/CORPUS_HARDENING_PLAN.md

What's done

+ Phase 0  Cleanup deprecated scripts                                ✅
+ Phase 1  Provenance backfill (407 YAMLs)                          ✅
+ Phase 2  AUTHORING.md + vault new scaffold                        ✅
+ Phase 3  audit_corpus_batched.py + _judges.py + _batching.py      ✅
+ Phase 4  Full-corpus audit (9,446) + backfills + merge            ✅
+ Phase 5  autonomous mass-apply: 2,279 corrections applied         ✅
            └─ 2,075 low-risk (format/title/level) via mass_apply
            └─   204 math fixes via verify_math_corrections
+ Phase 8  cron workflow + vault audit CLI subcommand               ✅

What remains

Phase 5 cleanup:        478 unresolved corrections (see PHASE_5_UNRESOLVED.md)
                          - 75 math 'no'      ← highest priority
                          - 14 math 'unclear'
                          -168 relabel-up
                          -138 chain-block
                          - 13 math+level-block
                          - 70 already-applied (no action)
Phase 6:                schema tightening (LinkML pattern + Pydantic forbid + lift gate)
Phase 7:                title-quality verification re-audit
Phase 9:                paper.tech update + tag vault-1.0.0

Total estimated remaining: ~9 hours of work, mostly:

Phase 5 unresolved review: 4-6h human
Phase 6 schema work: 2h code
Phase 7 verification: 30min
Phase 9 release: 1h

How to resume

Step 0 — sanity check the worktree

cd /Users/VJ/GitHub/MLSysBook-yaml-audit
git status                       # should be clean
git log --oneline -10            # confirm Phase 5 commits visible
git branch                       # * yaml-audit
vault check --strict             # 10,711 loaded, 0 invariant failures
pytest interviews/vault-cli/tests/ -q  # 84 passed
ruff check interviews/vault-cli  # clean

Step 1 — disposition the 478 unresolved corrections

Read interviews/vault-cli/docs/PHASE_5_UNRESOLVED.md first for the full breakdown. Suggested order, by priority:

Step 1a — math 'no' verdicts (75 questions, highest priority)

Independent Gemini check disputed Gemini's first proposed math fix. For each question, two interpretations:

First proposal was wrong (Gemini hallucinated)
Second pass was overly strict (the fix is fine)

Read the level_fit_rationale + coherence_rationale + math_errors in 01_audit.json to understand the original failure, then look at suggested_corrections.napkin_math in 01_audit.json, then decide.

# Walk these interactively:
vault audit review \
    --input interviews/vault/_pipeline/runs/full-corpus-20260503-merged/01_audit.json \
    --filter-gate math_correct \
    --limit 25

Step 1b — relabel-up cases (168 questions)

Each is a question Gemini judged "deserves a HIGHER level than claimed." Per §10 Q3, default policy is relabel-DOWN, but these go the other way. Two paths:

Accept the relabel-up (the question really IS bigger than its label)
Rewrite the question DOWN to actually match the claimed level

Triage in batches by track + topic. Open issue for each chunk that needs authoring follow-up.

Step 1c — chain-monotonicity blocks (138 + 13 = 151)

A level relabel was blocked because applying it would break the chains.json non-decreasing-level invariant.

For each, you need to either:

Move the question OUT of the chain (vault chain unlink ...), then apply the relabel, OR
Restructure the chain itself (merge / split / reorder).

These are chain-team decisions; not pure Phase 5 work.

Step 1d — math 'unclear' (14 questions)

Same workflow as 1a but Gemini was less confident. Defaulted to skip. Manually inspect.

Step 1e — already-applied (70 questions)

No action needed. The YAML's current state already matches the proposed correction.

Step 2 — Phase 6: tighten schema + lift format gate

Once corpus is clean (Step 1 done; or accept the residuals as known-deferred):

files to edit:

  interviews/vault/schema/question_schema.yaml (LinkML, source of truth):
    Details.common_mistake:
      pattern: '(?s).*\*\*The Pitfall:\*\*.*\*\*The Rationale:\*\*.*\*\*The Consequence:\*\*.*'
    Details.napkin_math:
      pattern: '(?s).*\*\*Assumptions.*\*\*Calculations:\*\*.*\*\*Conclusion.*'
    Question.provenance:
      required: true   # was implicitly defaulted to "imported"

  interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived):
    Details.model_config: ConfigDict(extra="forbid")   # was "allow"
    # Survey first; we already verified 0 unknown extras on Details
    # across 9,446 published YAMLs (2026-05-03 prep).
    # Question can stay extra="allow" for forward-compat
    # (audit-stamp fields like validation_status, math_status).

  interviews/vault-cli/src/vault_cli/validator.py:
    structural_tier:
      append _format_compliance() check
      (lift gate_format from validate_drafts.py / _judges.py
       to a published-corpus invariant)

run:
  vault codegen          # regenerate Pydantic / SQL DDL / TS types
  pytest                 # add tests covering new invariants
  vault check --strict   # 0 failures expected (corpus is clean)

Test plan: lift the LinkML pattern, run vault codegen, run pytest. If any test fails because a YAML doesn't match the new pattern, that YAML wasn't covered by Phase 5 — fix the YAML, not the schema.

Step 3 — Phase 7: title verification

Phase 5 already applied 79 title corrections. To verify they took correctly, run a small re-audit on those qids:

# Pull qid list from the disposition log:
python3 -c "
import json
d = json.loads(open('interviews/vault/_pipeline/runs/full-corpus-20260503-merged/02_mass_apply.json').read())
qids = [d['qid'] for d in d['dispositions']
        if d.get('result') == 'applied' and d.get('category') == 'title-only']
print(','.join(qids))
" > /tmp/title-fixed-qids.txt

# Re-audit those:
QIDS=$(cat /tmp/title-fixed-qids.txt)
vault audit run \
  --qids "$QIDS" \
  --workers 8 \
  --max-calls 5 \
  --output interviews/vault/_pipeline/runs/title-verify-20260504

Expect every title to come back title_quality: good.

Step 4 — Phase 9: paper.tech + release

# Update paper.tech with post-Phase-5 corpus stats:
#   - 9,446 published, audit pass rates per gate, per-track tables
#   - Methodology paragraph naming gemini-3.1-pro-preview as audit model
#   - Citation of audit_corpus_batched.py + AUDIT_FINDINGS_<date>.md

vault export-paper
vault build --local-json   # release_hash should roll
vault publish 1.0.0
vault verify 1.0.0 --git-ref v1.0.0   # citation-grade round-trip

git tag vault-1.0.0

Reference docs (in this worktree)

doc	purpose
`interviews/vault-cli/docs/CORPUS_HARDENING_PLAN.md`	full 9-phase workplan (the spec)
`interviews/vault-cli/docs/PHASE_4_HANDOFF.md`	original Phase 4 handoff (now historical)
`interviews/vault-cli/docs/PHASE_5_UNRESOLVED.md`	the 478 unresolved corrections + per-category review workflow
`interviews/vault-cli/docs/PHASE_6_HANDOFF.md`	this doc — resume guide
`interviews/vault-cli/docs/AUDIT_FINDINGS_2026-05-03.md`	Phase 4 corpus snapshot before corrections
`interviews/vault/AUTHORING.md`	single-source authoring reference

Gemini-driven scripts (in vault-cli/scripts/)

script	purpose
`audit_corpus_batched.py`	full-corpus audit with optional `--propose-fixes`
`apply_corrections.py`	INTERACTIVE accept/reject of proposed corrections
`mass_apply_corrections.py`	AUTONOMOUS apply of low-risk corrections (already used)
`verify_math_corrections.py`	independent Gemini verify-then-apply for math fixes (already used)
`summarize_audit.py`	generate AUDIT_FINDINGS markdown
`merge_audit_runs.py`	merge per-track audit dirs into one canonical
`_judges.py` / `_batching.py`	shared helpers

Pipeline data on disk (gitignored, ready for use)

interviews/vault/_pipeline/runs/
├── full-corpus-20260503/                  main audit (cloud + edge + global)
├── full-corpus-20260503-mobile/           parallel mobile run
├── full-corpus-20260503-tinyml/           parallel tinyml run
├── full-corpus-20260503-cloud-backfill/   cloud propose-fixes backfill
├── full-corpus-20260503-edge-backfill/    edge propose-fixes backfill
├── full-corpus-20260503-merged/           ← canonical merged dataset
│   ├── 01_audit.json                       9,446 rows, all gates verified
│   ├── 02_mass_apply.json                  per-qid disposition for low-risk apply
│   ├── 03_math_verification.json           per-qid Gemini verification for math fixes
│   └── 04_math_applied.json                per-qid disposition for math apply
└── (older runs from earlier in workflow)

The merged 01_audit.json is the input for ALL Phase 5 review and Phase 7 verification.

Final commit log highlights from this session

79b4c3361  docs(vault-cli): PHASE_5_UNRESOLVED.md — list of corrections needing human review
f4d219ab2  fix(vault): apply 204 Gemini-verified math corrections (Phase 5 math leg)
04c69e6a5  feat(vault-cli): verify_math_corrections.py — Phase 5 math-fix verifier
15811ef4b  feat(vault-cli): mass_apply_corrections.py — Phase 5 low-risk auto-applier
e62e7e27b  fix(vault): apply 2,075 low-risk Gemini-proposed corrections (Phase 5 mass-apply)
9ee3c3430  docs(vault-cli): PHASE_4_HANDOFF — update post-backfill
87481ab6a  docs(vault-cli): refresh AUDIT_FINDINGS_2026-05-03 after Phase 4 backfill
2131696b8  fix(vault/cloud): move stray top-level options/correct_index into details
68012912f  feat(vault-cli): vault audit CLI subcommand — Phase 8
d2621cc9e  feat(vault-cli): merge_audit_runs.py + Phase 4 findings doc
2d9330da6  fix(vault-cli): isolate gemini CLI scratch files in temp dir
e7a2a27bf  feat(ci): staffml-audit-corpus-monthly.yml — recurring corpus audit workflow
3eaac3ca9  feat(vault-cli): summarize_audit.py — Phase 4 finalization helper
1722133fa  feat(vault-cli): apply_corrections.py — interactive accept/reject
1b58a9c50  feat(vault-cli): parallel audit_corpus_batched.py with submit-stagger
69cf6f0a5  feat(vault-cli): audit_corpus_batched.py — full-corpus batched audit
dd71c66ca  feat(vault-cli): _judges.py + _batching.py — shared infra
f691d6c14  feat(vault-cli): vault new scaffolds full Pitfall/Rationale/Consequence stubs
7500b9281  docs(vault): AUTHORING.md — single-source authoring reference
e8f0faa83  chore(vault): explicit provenance: imported on 407 published questions
56d3ed155  chore(vault): remove 18 deprecated scripts per CORPUS_HARDENING_PLAN.md Phase 0
36f2ef592  docs(vault-cli): CORPUS_HARDENING_PLAN.md — supersedes RELEASE_AUDIT_PLAN.md

When ready to land on dev

cd /Users/VJ/GitHub/MLSysBook   # main worktree where dev is checked out
git pull origin dev
git merge --no-ff yaml-audit -m "Merge yaml-audit — Phase 0-5 of corpus hardening"
# (don't push until ready)

Or via PR if you'd rather review on GitHub.

The merge will land 2,279 corrected YAMLs + ~30 new tooling/doc files. No schema changes (Phase 6 is gated on Phase 5 unresolved cleanup).

Resume prompt for the next Claude Code session

Resume yaml-audit branch. Phase 5 autonomous portion is done; 2,279 of
2,757 corrections applied. Read interviews/vault-cli/docs/PHASE_6_HANDOFF.md
top to bottom — it's the self-contained resume guide.

Then start at Step 0 (sanity check), then Step 1 (disposition the 478
unresolved corrections in PHASE_5_UNRESOLVED.md), then Step 2 (Phase 6
schema tightening), Step 3 (Phase 7 title verification), Step 4 (Phase 9
release).

Total estimated remaining: ~9 hours, mostly Phase 5 review + Phase 6
schema work.

12 KiB Raw Permalink Blame History