cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 18:18:42 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	a74c98576e	Merge origin/dev into yaml-audit Sync the yaml-audit branch with the latest dev work since the previous sync (`5c5af75ed`). Brings in 73 commits including: - CI security fixes: postcss XSS bump, uuid bounds bump, codeql paths-ignore for vendored bundles, read-only token on staffml-validate-vault workflow - kits/ dark mode polish: code-block readability, dropdown contrast - vault-cli/: pre-commit ruff hook + 20 ruff fixes, all-contributors auto-credit workflow change to pull_request_target - dev's earlier merge of yaml-audit (`836d481b5`) carrying the pre-trailer-strip Phase 1/2/3 history; this merge harmonises that with the current trailer-clean yaml-audit tip - misc bug fixes (tinytorch perceptron seed, infra workflows, socratiq vite dev injector) Conflicts resolved (if any) preserve the yaml-audit-side authoritative state for vault/* files (we own those) and the dev-side authoritative state for .github/workflows/* and other shared infrastructure. # Conflicts: # .github/workflows/all-contributors-auto-credit.yml # .github/workflows/staffml-preview-dev.yml # interviews/staffml/src/data/corpus-summary.json # interviews/staffml/src/data/vault-manifest.json # interviews/staffml/tests/chain-and-vault-smoke.mjs # interviews/vault-cli/README.md # interviews/vault-cli/docs/CHAIN_ROADMAP.md # interviews/vault-cli/scripts/build_chains_with_gemini.py # interviews/vault-cli/scripts/generate_question_for_gap.py # interviews/vault-cli/scripts/merge_chain_passes.py # interviews/vault-cli/scripts/validate_drafts.py # interviews/vault-cli/src/vault_cli/legacy_export.py # interviews/vault-cli/tests/test_chain_validation.py # interviews/vault/.gitignore # interviews/vault/ARCHITECTURE.md # interviews/vault/chains.json # interviews/vault/id-registry.yaml # interviews/vault/questions/edge/optimization/edge-2536.yaml # interviews/vault/questions/mobile/deployment/mobile-2147.yaml # tinytorch/src/03_layers/03_layers.py	2026-05-02 11:06:43 -04:00
Vijay Janapa Reddi	b84691e440	feat(vault-cli): generate_question_for_gap pre-filter for hallucinated gaps The 2026-05-02 audit found ~70% of detected chain gaps are hallucinated — the two anchor questions don't share a scenario thread, so a "bridge" between them is fictional. Without this gate, generating from the existing 407-gap backlog would waste ~75% of the budget (1 generation call + 3 downstream-judge calls per bad gap). Adds a 1-call pre-filter via call_gemini_prefilter. The judge sees the gap entry plus the two anchors in full and returns: { "verdict": "real" \| "hallucinated", "anchors_share_scenario": "yes" \| "no", "level_makes_sense": "yes" \| "no", "rationale": "<one sentence>" } Hallucinated → process_gap returns ok=False with the prefilter verdict captured for review. Real → falls through to generation (unchanged downstream behaviour). Cost analysis at 70% hallucination rate, 30-gap batch: Before: 30 generations + 90 judge calls = 120 calls; ~24 promotable drafts After: 30 prefilter + ~9 generations + 27 judge calls = 66 calls; ~7 promotable drafts (same yield, half the cost) Skip the pre-filter with --skip-prefilter when re-validating an already-filtered gap list or for cost-debugging. Default is filter ON. Smoke checks (mock prefilter responses): - "real" → process_gap returns ok=True, falls through to generation - "hallucinated" → ok=False, why="pre-filter: hallucinated gap (...)" - --skip-prefilter → no pre-filter call, dry_run shows the prompt	2026-05-02 09:49:48 -04:00
Vijay Janapa Reddi	2b3cf5e1da	chore(vault): consolidate AI pipeline artifacts under _pipeline/ Establishes one ignored subdirectory for ALL intermediate outputs of LLM-driven tooling (chain proposals, gap detection, draft scorecards, audit traces). Single gitignore rule: /_pipeline/. Convention is documented in interviews/vault/README.md under "Pipeline artifacts" — it's a real project layout convention, not AI-specific config. Path migration: interviews/vault/chains.proposed.json → _pipeline/chains.proposed.json interviews/vault/gaps.proposed.json → _pipeline/gaps.proposed.json interviews/vault/draft-validation-scorecard.json → _pipeline/draft-validation-scorecard.json interviews/vault/audit-runs/ → _pipeline/runs/ 8 scripts updated to define a PIPELINE_DIR constant and route default outputs through it: build_chains_with_gemini.py, apply_proposed_chains.py, merge_chain_passes.py, validate_drafts.py, audit_chains_with_gemini.py, generate_question_for_gap.py, summarize_proposed_chains.py, promote_drafts.py. Forward-looking docs (README.md chain-pipeline section + CHAIN_ROADMAP.md resume instructions + state snapshot) updated to reference the new paths. Historical Progress Log entries left as-is — they accurately describe what was committed at the time. Drive-by .gitignore fixes (both used full repo-relative paths under package-local .gitignore files, which never matched): interviews/vault-cli/.gitignore: scripts/.calibration_cache/ interviews/vault/.gitignore: /embeddings.npz Validation: - vault check --strict: 10,705 loaded, 0 invariant failures - pytest interviews/vault-cli/tests/: 74/74 - audit --dry-run: paths resolve correctly to _pipeline/runs/<ts>/ No durable corpus content moves. chains.json (live registry), id-registry.yaml, questions/, etc. all stay where they were.	2026-05-02 09:04:55 -04:00
Vijay Janapa Reddi	604869b986	feat(vault-cli): Phase 3.a + 3.b — gap-driven authoring tooling Two new scripts that together close the loop from a gap entry to a reviewable candidate question with a multi-gate scorecard. generate_question_for_gap.py (3.a): - Reads a gap entry, loads between-questions + same-bucket exemplars, prompts gemini-3.1-pro-preview, runs Pydantic Question validation, and writes <track>/<area>/<id>.yaml.draft. The .draft suffix keeps drafts out of vault check / vault build until promotion. - ID allocator scans corpus + existing drafts so a batch run gets distinct fresh IDs without touching id-registry.yaml. - Modes: --gap-index, --gaps-from + --limit, --dry-run. validate_drafts.py (3.b): - Five gates per draft: schema (Pydantic), originality (cosine vs in-bucket neighbours via BAAI/bge-small-en-v1.5; matches the corpus embeddings.npz so values are comparable; cutoff 0.92), level_fit (Gemini-judge against same-level exemplars), coherence (Gemini-judge: scenario/question/solution consistency), and bridge (Gemini-judge: chain-fit between the gap's two anchors). - Final verdict pass iff every non-skipped gate passes. - Skips: --no-originality, --no-llm-judge. - Output: interviews/vault/draft-validation-scorecard.json. Smoke checks: - 3.a --dry-run --gap-index 0: resolves gap, builds prompt, allocates cloud-4579. Synthetic Gemini response Pydantic-validates clean. - 3.b on a synthetic /tmp draft: schema + originality pass (top neighbour cosine 0.73 vs 0.92 threshold). Phase 3.c (pilot run on 30 gaps) deferred: it generates new YAML question content that needs human review before promotion. The tooling ships ready; running it is a user-supervised step. CHAIN_ROADMAP.md Progress Log + Phase 3 status updated.	2026-05-01 11:31:06 -04:00
Vijay Janapa Reddi	4b880ebb1a	feat(vault-cli): Phase 3.a + 3.b — gap-driven authoring tooling Two new scripts that together close the loop from a gap entry to a reviewable candidate question with a multi-gate scorecard. generate_question_for_gap.py (3.a): - Reads a gap entry, loads between-questions + same-bucket exemplars, prompts gemini-3.1-pro-preview, runs Pydantic Question validation, and writes <track>/<area>/<id>.yaml.draft. The .draft suffix keeps drafts out of vault check / vault build until promotion. - ID allocator scans corpus + existing drafts so a batch run gets distinct fresh IDs without touching id-registry.yaml. - Modes: --gap-index, --gaps-from + --limit, --dry-run. validate_drafts.py (3.b): - Five gates per draft: schema (Pydantic), originality (cosine vs in-bucket neighbours via BAAI/bge-small-en-v1.5; matches the corpus embeddings.npz so values are comparable; cutoff 0.92), level_fit (Gemini-judge against same-level exemplars), coherence (Gemini-judge: scenario/question/solution consistency), and bridge (Gemini-judge: chain-fit between the gap's two anchors). - Final verdict pass iff every non-skipped gate passes. - Skips: --no-originality, --no-llm-judge. - Output: interviews/vault/draft-validation-scorecard.json. Smoke checks: - 3.a --dry-run --gap-index 0: resolves gap, builds prompt, allocates cloud-4579. Synthetic Gemini response Pydantic-validates clean. - 3.b on a synthetic /tmp draft: schema + originality pass (top neighbour cosine 0.73 vs 0.92 threshold). Phase 3.c (pilot run on 30 gaps) deferred: it generates new YAML question content that needs human review before promotion. The tooling ships ready; running it is a user-supervised step. CHAIN_ROADMAP.md Progress Log + Phase 3 status updated.	2026-05-01 11:31:06 -04:00

5 Commits