mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 09:38:33 -05:00
Phase 8 deliverable: runs audit_corpus_batched.py against the full
corpus on the 1st of each month at 14:00 UTC, plus workflow_dispatch
for ad-hoc operator-triggered runs.
Job:
- Set up Python 3.12, install vault-cli
- vault check --strict + pytest (sanity)
- Run audit_corpus_batched.py --all --workers 4 --max-calls <input>
(cron uses 250; manual lets operator override)
- Run summarize_audit.py to emit AUDIT_FINDINGS.md
- Upload _pipeline/runs/cron-<UTC>/ as a 90-day-retention artifact
Activation requirements (documented in YAML comments):
1. Repository secret GEMINI_API_KEY must be provisioned
2. Runner image must have gemini CLI installed (or script adapted
to Python SDK)
Until those land, the workflow YAML is in place but workflow_dispatch
fails clearly on the gemini-CLI-missing check. Activation is a
one-line change once the auth path is decided.
TODO (deferred to a follow-up): regression-comparison vs. last
month's artifact + gh issue create on threshold breach. Skipped on
the first run (no baseline). The skeleton is documented inline.
CORPUS_HARDENING_PLAN.md Phase 8.