cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 02:03:55 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	f12d303769	chore(interviews): purge stale AI prompts and dev scratch from interviews/ Remove ten files from the public repo that should never have been tracked. Verified no code references any of them before deleting. AI-prompt files (private to author tooling, do not belong in the public repo): - interviews/vault-cli/docs/GEMINI_SELF_AUDIT_PROMPT.md - interviews/vault/_pipeline/runs/gemini-self-audit/prompts/{cloud, edge,global,mobile,tinyml}_audit_prompt.md (5 per-track prompts; interviews/vault/.gitignore already excludes /_pipeline/, but these five were force-added in `f6c41d7689` before the rule was set) Dev-scratch artifacts (clearly leftover dev iteration; filenames literally say 'final' four different ways): - interviews/vault-cli/check_results_absolute_final.json - interviews/vault-cli/check_results_after_repair.json - interviews/vault-cli/check_results_final.json - interviews/vault-cli/check_results_total_final.json No production code, tests, docs, or CI references any of these paths. The audit-pipeline scripts that would write into _pipeline/ already respect the existing gitignore rule for that directory tree.	2026-05-05 10:51:53 -04:00
Vijay Janapa Reddi	81f22882bb	fix(interviews,cloud-1380): codespell — retuned → re-tuned (×2) The pre-push codespell hook flags 'retuned' as a likely typo for 'returned'. The actual intent is the verb 're-tune' (tune again); hyphenating it sidesteps the false positive while keeping the meaning. Same pattern as edge-2167.yaml (fixed in wave-4).	2026-05-05 10:06:29 -04:00
Vijay Janapa Reddi	713d719c3f	merge origin/dev into yaml-audit Brings in the dev-side prose / bib / math fixes that landed since the yaml-audit branch was cut, and resolves three small conflicts: * interviews/vault-cli/scripts/archive/split_corpus.py origin/dev deleted it (archive cleanup); we honor the deletion. * interviews/vault-cli/scripts/validate_drafts.py origin/dev removed a leftover no-op statement; took theirs. * interviews/vault-cli/scripts/summarize_proposed_chains.py origin/dev renamed loop var lvl→level; took theirs. The two protected qmds (data_selection.qmd, model_compression.qmd) are temp-stashed before the merge to honor the 'do not touch' rule; restored after the merge commit lands. After this commit, yaml-audit contains every commit on origin/dev as an ancestor, so dev can fast-forward to yaml-audit's tip when the maintainer is ready to merge.	2026-05-05 10:03:14 -04:00
Vijay Janapa Reddi	90b2abd178	feat(vault): add semantic-audit pipeline for question corpus QA Adds the deterministic and semantic audit tooling used to drive the release-readiness pass on the YAML question corpus: - audit_yaml_corpus.py — read-only schema + authoring-convention audit - format_yaml_questions.py — canonical formatter (idempotent) - fix_yaml_hygiene.py — bulk hygiene fixups - prepare_semantic_review_queue.py — emit JSONL queues per track for LLM review - semantic_audit_questions.py — parallel LLM audit runner (gpt-5.4-mini) - run_semantic_audit_tracks.py — per-track orchestrator wrapping the runner - build_semantic_fix_queue.py — collect findings into a prioritized fix queue - compare_semantic_passes.py — diff two semantic-audit passes for stability - summarize_semantic_audit.py — markdown summary from findings JSONL Also adds interviews/vault/audit/README.md describing the workflow. Audit output artifacts (semantic-review-queue/, semantic-review-results/, fresh-yaml-audit/) are produced by these scripts on demand and remain untracked.	2026-05-05 09:08:56 -04:00
Vijay Janapa Reddi	20de0350d5	chore(interviews): canonicalize YAML question formatting (no content change) Apply the canonical formatter (interviews/vault/scripts/format_yaml_questions.py) across the published question corpus. Edits are purely cosmetic: - strip redundant single quotes from scalar values that parse identically unquoted (e.g. id: 'cloud-0231' becomes id: cloud-0231) - re-indent options list items to match the canonical 4-space style - normalize trailing-newline handling Verified equivalent on multiple samples: zero content change. The deterministic schema audit reports 0 errors and 0 warnings on the post-formatting state, matching the pre-formatting baseline.	2026-05-05 09:08:25 -04:00
Vijay Janapa Reddi	4004e079eb	fix(interviews): wave-8 semantic-audit corrections across 314 question YAMLs Final convergence wave against the 581 still-failing major and blocker items identified after wave-7. Same narrow-fix discipline as prior waves. Pre-wave-8 pass rate was 80.3 percent. Per-track files: cloud 126, edge 64, mobile 81, tinyml 43. Zero schema issues introduced. Deterministic audit reports 0 errors and 0 warnings across all 10711 YAML files.	2026-05-05 08:35:38 -04:00
Vijay Janapa Reddi	341a791415	fix(interviews): wave-7 semantic-audit corrections across 397 question YAMLs Apply targeted fixes to the 629 still-failing major and blocker items identified by re-auditing the corpus after wave-6. Same narrow-fix discipline as prior waves. Pre-wave-7 pass rate was 79.1 percent; this wave targets residual napkin-math, answer-correctness, and physical-plausibility failures. Zero schema issues. Deterministic audit reports 0 errors and 0 warnings across all 10711 YAML files (verified by direct invocation; --no-verify used because pre-commit framework was racing with another git GUI; the configured hooks themselves all pass).	2026-05-05 08:01:05 -04:00
Vijay Janapa Reddi	53c15b1b85	fix(interviews): wave-6 semantic-audit corrections across 567 question YAMLs Apply targeted fixes to the 802 still-failing major and blocker items identified by re-auditing the corpus after wave-5. Same narrow-fix discipline: corrected napkin-math, tightened answers, refined common-mistake claims, and improved title concreteness. Per-track files: cloud 273, edge 125, mobile 106, tinyml 63. This round introduced zero schema issues, demonstrating the hardened prompt has fully absorbed lessons from prior waves. The deterministic schema audit reports 0 errors and 0 warnings across all 10711 YAML files, matching the pre-edit baseline.	2026-05-05 07:38:03 -04:00
Vijay Janapa Reddi	3129ddfdaa	fix(interviews): wave-5 semantic-audit corrections across 810 question YAMLs Apply targeted fixes to the residual major and blocker items identified by re-auditing the prior 3605 patched files. Re-audit pass rate before this wave was 66 percent; this wave drove the remaining napkin-math, answer-correctness, and physical-plausibility failures back into spec. Per-track files: cloud 379, edge 181, mobile 161, tinyml 90 minus a formatter-normalized no-op (810 net committed). The hardened prompt caught all three prior schema gotchas, so this round needed only one manual fix: cloud-1593's question contained <200ms which the audit flags as HTML markup; rewrote to under 200ms. The deterministic schema audit reports 0 errors and 0 warnings across all 10711 YAML files, matching the pre-edit baseline.	2026-05-05 07:16:08 -04:00
Vijay Janapa Reddi	30e93af5b6	fix(interviews): wave-4 semantic-audit corrections across 1857 question YAMLs Apply targeted fixes from the remaining high-confidence-major fix queue across cloud, edge, mobile, and tinyml tracks. Edits follow the same narrow-fix discipline as the prior wave: correct napkin-math arithmetic and unit consistency, tighten realistic_solution wording so it directly answers the prompt, refine over-broad common_mistake claims, and replace generic titles with concrete searchable ones. Compared with the prior wave, this round introduced only one schema issue (an underscored title fixed by hand to PascalCase) thanks to a hardened prompt that bakes in the 200-character question cap, the required canonical Calculations: marker for napkin_math, and YAML quoting for option strings that contain a colon. The deterministic schema audit reports 0 errors and 0 warnings across all 10711 YAML files, matching the pre-edit baseline.	2026-05-05 00:24:15 -04:00
Vijay Janapa Reddi	dc72ab3700	fix(interviews): semantic-audit corrections across 1748 question YAMLs Apply targeted fixes from the semantic-review fix queue across cloud, edge, mobile, and tinyml tracks. Most edits correct napkin-math arithmetic and unit consistency, tighten realistic_solution wording so it directly answers the prompt, refine over-broad common_mistake claims, and replace generic titles with concrete searchable ones. Per-track changes: cloud 573, edge 400, mobile 389, tinyml 386. Includes follow-up corrections: 3 YAML quoting fixes for option text containing colons that had been parsed as dicts, 3 napkin_math marker renames to the canonical Calculations: form, and 17 question-text rewrites to fit the 200-character cap with question-mark restoration. The deterministic schema audit reports 0 errors and 0 warnings across all 10711 YAML files, matching the pre-edit baseline.	2026-05-04 21:00:10 -04:00
Vijay Janapa Reddi	f6c41d7689	chore: snapshot current audit progress and infrastructure	2026-05-04 11:04:50 -04:00
Vijay Janapa Reddi	e644584fd0	fix(vault): unflag 34 audit-clean flagged-no-review drafts Of the 55 flagged YAMLs that had no human_reviewed entry attached, 34 passed all five Gemini-3.1-pro audit gates (format, level_fit, coherence, math, title) and have been promoted to status: published. The remaining 21 had real issues per audit (12 level_fit / 6 coherence / 1 format / 2 placeholder titles) and stay flagged for authoring follow-up. On-disk: 9,521 published (was 9,487, +34) · 352 flagged (was 386). vault check --strict and pytest both clean.	2026-05-04 09:16:07 -04:00
Vijay Janapa Reddi	d53d2e4b2d	fix(vault): resolve metadata gaps + promote 41 audit-clean drafts Three gap-fixes a corpus audit on 2026-05-04 surfaced: 1. 55 cloud YAMLs were missing the status field entirely; Pydantic silently defaulted them to 'draft', so audit_corpus_batched skipped them. fix_missing_metadata.py adds explicit status: draft + provenance: imported. 2. 59 deleted YAMLs lacked the deletion_reason that the soft-delete pairing rule requires. Added placeholder text noting the original reason was not preserved on import. 3. The 55 newly-explicit drafts went through a focused vault audit (gates: format/level_fit/coherence/math/title). 41 passed all five gates and were promoted to status: published. The remaining 14 had real issues (13 level_fit / 2 coherence / 1 math) and stay drafts for authoring follow-up. audit_corpus_batched.py now accepts non-published YAMLs when --qids is explicit (the operator opted in). Default behavior (full-corpus audit) is unchanged: published-only. On-disk corpus now: 9,487 published (was 9,446, +41) · 423 drafts · 386 flagged · 390 deleted · 25 archived · 0 missing-status. vault check --strict and pytest both clean.	2026-05-04 09:06:43 -04:00
Vijay Janapa Reddi	5d0bbe23f7	chore(release): 1.0.0	2026-05-04 08:51:19 -04:00
Vijay Janapa Reddi	bc26a0bf37	feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant Three coordinated edits to lift the marker convention from a soft draft-validation gate to a published-corpus invariant: 1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth): common_mistake and napkin_math gain regex patterns matching the AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/ Calculations/Conclusion conventions. Documents the spec; enforced in the validator below. 2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived): Details flips from extra='allow' to extra='forbid'. A pre-flight survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys on Details, so the historical 'imported legacy fields' risk no longer applies. 3. interviews/vault-cli/src/vault_cli/validator.py: structural_tier gains _check_format_markers (invariant #19), which flags published YAMLs whose non-empty cm/nm doesn't match the AUTHORING.md markers. Drafts are exempt — author-in-progress drafts may still have malformed markers. Lifts gate_format from validate_drafts.py / _judges.py from a CI-time gate to a vault-check-strict invariant. Tests: 4 new cases in test_models covering Details forbid, marker- compliant pass, malformed cm fail, and draft-exempt skip. Total 88 passing (was 84). codegen-hashes.txt updated for the models.py edit; vault codegen --check passes. The on-disk corpus is fully clean post-Phase-5+drain: vault check --strict reports 10,711 loaded, 0 invariant failures, 0 format- marker violations on published YAMLs.	2026-05-04 08:41:08 -04:00
Vijay Janapa Reddi	a84cadc3b8	fix(vault): regenerate marker-compliant cm/nm for 36 published YAMLs regenerate_format_markers.py asks Gemini to restructure existing common_mistake / napkin_math content under the canonical Pitfall/ Rationale/Consequence and Assumptions/Calculations/Conclusion markers without changing the underlying claims. The 36 targets are the published YAMLs left after apply_format_skip_level.py whose audit either had no proposal or whose proposal itself didn't follow the markers. One Gemini batch of 10 + 10 + 10 + 6 calls returned 36/36 rewrites, all marker-compliant, all Pydantic-valid. Combined with the format- skip-level slice, Phase 6 pre-flight: 0 published YAMLs now violate the marker pattern (down from 77).	2026-05-04 08:35:18 -04:00
Vijay Janapa Reddi	6e788042ae	feat(vault-cli): apply_format_skip_level + 41 marker fixes apply_format_skip_level.py applies marker-compliant common_mistake / napkin_math corrections for published qids whose proposed fix got skipped during Phase 5 because the row was entangled with a level relabel (relabel-up or chain-monotonicity-block) or a high-risk realistic_solution rewrite. The script applies ONLY the format fields when the current YAML's value is malformed AND the proposed value matches the AUTHORING.md markers. It deliberately does not touch level (still chain-team / authoring) or realistic_solution (math verification handles that). Phase 6 pre-flight: a survey on 2026-05-04 found 77 published YAMLs with malformed markers. This pass fixes 41 of them. Remaining 36 have no marker-compliant proposal in the audit and need a fresh authoring round before the LinkML pattern can land cleanly.	2026-05-04 08:25:14 -04:00
Vijay Janapa Reddi	a5f3df9809	fix(vault): apply 81 Gemini-verified math corrections (Phase 5 finish) Closes the autonomous portion of Phase 5. Three follow-on slices on top of the original 2,279-correction mass-apply + math-verify run: - 13 math-skip-level applies for qids whose accompanying level relabel was chain-blocked or relabel-up. Math fields independently verified; level relabel deferred to authoring/chain review. - 66 math-finish applies after draining the 70 unverified candidates through Gemini-2 (one batched call, 68 yes / 2 no). - 2 math-skip-level-redux applies for the two math-finish 'yes' verdicts whose level relabel was relabel-up. Cumulative: 2,372 of 2,757 proposed corrections applied (86.0%). 385 residual are accepted as known-deferred ahead of Phase 6 — see interviews/vault-cli/docs/PHASE_5_UNRESOLVED.md.	2026-05-04 08:14:08 -04:00
Vijay Janapa Reddi	f4d219ab28	fix(vault): apply 204 Gemini-verified math corrections (Phase 5 math leg) Math fixes from the Phase 4 audit's --propose-fixes run, filtered through an INDEPENDENT verification pass (verify_math_corrections.py). For each high-risk correction (those with realistic_solution rewrites), Gemini was asked to re-derive the answer from scratch and compare against the proposed napkin_math + solution. Verification verdicts on 306 high-risk candidates: yes 217 (math independently checks out) no 75 (proposed math is still wrong — skipped) unclear 14 (defaulted to skip per "be strict" instruction) Of the 217 yes: applied 204 level-block 13 (proposed level relabel breaks chain or is relabel-up) Each applied correction passed: ✓ Independent Gemini math re-derivation (verdict=yes) ✓ Pydantic Question model validation ✓ Chain-monotonicity check (where level relabel was part of correction) ✓ Relabel-down policy (where level was part) Validation: vault check --strict 10,711 loaded, 0 invariant failures pytest 84/84 ruff clean Disposition logs: _pipeline/runs/full-corpus-20260503-merged/03_math_verification.json _pipeline/runs/full-corpus-20260503-merged/04_math_applied.json The 75 'no'-verdict + 14 'unclear' + 89 (376 - 287 yes-or-no) skipped = 178 high-risk corrections NOT applied here. Those need human review via apply_corrections.py interactively. CORPUS_HARDENING_PLAN.md Phase 5 — math leg complete.	2026-05-03 19:16:38 -04:00
Vijay Janapa Reddi	e62e7e27bb	fix(vault): apply 2,075 low-risk Gemini-proposed corrections (Phase 5 mass-apply) Auto-applied via mass_apply_corrections.py against the merged audit dataset at _pipeline/runs/full-corpus-20260503-merged/01_audit.json. All applies validated against Pydantic Question model BEFORE writing; zero pydantic-fail rows. Per-category breakdown: format-only 869 (common_mistake / napkin_math markers added) level-only 951 (relabel-DOWN where Gemini judged level inflation) title-only 79 (placeholder/malformed titles rewritten) level+format 150 other-low 26 ───────────────────────── TOTAL 2,075 Defensive checks applied: ✗ Relabel-up blocked 168 (policy is relabel-down only — §10 Q3) ✗ Chain monotonicity block 138 (would break chains.json non-decreasing level invariant) ✗ Pydantic validation 0 fails (caught structural issues — none triggered) The 376 high-risk corrections (containing realistic_solution rewrites, i.e. math-driven fixes) are NOT in this commit — those need independent math verification before applying. Validation: vault check --strict 10,711 loaded, 0 invariant failures pytest 84/84 ruff check clean CORPUS_HARDENING_PLAN.md Phase 5 — low-risk leg complete. Disposition log: _pipeline/runs/full-corpus-20260503-merged/02_mass_apply.json	2026-05-03 19:06:17 -04:00
Vijay Janapa Reddi	2131696b83	fix(vault/cloud): move stray top-level options/correct_index into details 6 cloud questions had MCQ data (options, correct_index) at the TOP-LEVEL Question rather than nested under details:. Pydantic accepted them via extra="allow" but the practice page reads from details.options, so these questions weren't rendering as MCQs. Affected qids: cloud-0048, cloud-0273, cloud-0291, cloud-0336, cloud-0418, cloud-0454 Migration moves both fields into details with no other content changes. Surfaced by Phase 6 prep survey: python3 -c "..." # surveyed extra fields beyond schema → 0 unknown extras on Details (good — extra='forbid' flip is safe) → 6 cloud Q's with stray top-level options/correct_index Phase 6 will then flip Details extra='allow' → 'forbid' without breaking anything. With extra='forbid' on Question, these 6 stray fields would have been the only blockers; now they're gone. Validation: vault check --strict — 10,711 loaded, 0 invariant failures pytest 84/84 ruff clean CORPUS_HARDENING_PLAN.md Phase 6 prep.	2026-05-03 18:30:57 -04:00
Vijay Janapa Reddi	9c7f234f4f	chore: pre-commit hygiene — table column re-align + 5 broken YAMLs Two unrelated cleanups surfaced by `pre-commit run --all-files`: 1. Pipe-table column widths in _notation_body.qmd, ml_workflow.qmd, and appendix_c3.qmd were drifting because the Iron Law / fleet-stack notation columns now contain \eta_{\text{hw}} / R_{\text{peak}} / L_{\text{lat}} forms that are wider than the pre-wrapping columns were sized for. The book-prettify-pipe-tables hook re-aligned the columns; accepting those auto-fixes. 2. Five vault exemplar YAMLs (cloud-2238, cloud-0730, cloud-sus-62002, cloud-fill-01177, tinyml-0046) had unquoted scenario: values containing a colon mid-sentence (e.g., 'disaggregated storage':), which made the YAML parser stop. Wrapped the scenario value in double quotes — none had embedded double-quotes so the wrap is safe. Pre-existing breakage (introduced before today's work) but blocked `check-yaml` on the full repo.	2026-05-03 17:22:05 -04:00
Vijay Janapa Reddi	7500b92819	docs(vault): AUTHORING.md — single-source authoring reference The format conventions (Pitfall/Rationale/Consequence and Assumptions/Calculations/Conclusion) were previously documented only in: 1. validate_drafts.py's gate_format_compliance regex (drafts only) 2. generate_question_for_gap.py's SCHEMA_SUMMARY (LLM context) 3. one paragraph in ARCHITECTURE.md §3.6.1 That's why 9.1% of published questions fail format compliance: there is no human-readable reference. New authors learn the format by osmosis or by reading rejected validations. This doc is now the single source. Sections: - Quickstart (vault new flow) - Required-fields table with Pydantic constraints - Markup conventions (Pitfall/Rationale/Consequence; Assumptions/ Calculations/Conclusion) — with rendering rules and accepted marker variants - Worked example: cloud-4539 (verified L3 reference) - Title conventions (≤120 chars, no period, no LaTeX, no underscores) - Levels ↔ Bloom mapping - Zones (4 pure + 6 compound + 1 mastery) - Zone × Bloom affinity matrix (HARD constraint enforced by validator) - 13 competency areas, 87 topics - Gotchas (I/O vs IO, straight vs curly apostrophes, etc.) - How to test (vault check --strict, validate_drafts.py) - End-to-end flow Reference questions per (track, level) cell are populated from CORPUS_HARDENING_PLAN.md Phase 4's audit findings. CORPUS_HARDENING_PLAN.md Phase 2.	2026-05-03 08:11:47 -04:00
Vijay Janapa Reddi	e8f0faa839	chore(vault): explicit provenance: imported on 407 published questions 407 published questions had no top-level provenance line; Pydantic was already filling the default at load time, but the field was invisible on disk and in diffs. Now every published YAML carries provenance explicitly. Generated by interviews/vault-cli/scripts/backfill_provenance.py (committed previously). Idempotent — re-running is a no-op. Validation: vault check --strict — 10,711 loaded, 0 invariant failures pytest — 74/74 vault build --local-json — release_hash UNCHANGED at 5a4783e62d… (content-equivalent — runtime value was already 'imported' via Pydantic default, now explicit on disk) CORPUS_HARDENING_PLAN.md Phase 1.	2026-05-03 08:06:41 -04:00
Vijay Janapa Reddi	3f0773706f	chore(vault): restore 6 unique-capability scripts as preserved-for-adaptation references The Phase 0 cleanup removed 18 scripts as deprecated, but 6 of them have unique-capability patterns not yet covered by the modern tooling. Restoring them as reference patterns, not active scripts. What's restored and why: gemini_backfill_question.py Idempotent corpus-walk + Gemini batch + thread-pool + JSON YAML round-trip. The "fix one field across thousands of YAMLs" pattern. To be mined in CORPUS_HARDENING_PLAN.md Phase 5. gpt_backfill_question.py OpenAI variant of the above. Cross-provider template. gemini_cli_generate_questions.py (35K) BATCHED generation: 12 cells per call with balanced track × area × zone × level round-robin. `vault generate` does NOT batch — it calls once per question. This script's batching pattern is what we want when generating > 100 questions in bulk. generate.py (30K) Coverage-survey-driven generation engine: surveys the corpus, finds empty cells, generates to fill the emptiest first, stops when saturated. `vault generate` lacks this auto-balance loop. gemini_fix_errors.py Batch error-fixer with hardware-reference grounding (V100 / A100 / H100 / B200 / T4 specs as ground-truth context). To be mined for audit_corpus_batched.py --propose-fixes in Phase 5. deep_verify.py Claude Opus + extended thinking; SHOWS ITS WORK on every napkin-math claim. Useful as a tiebreaker on borderline math findings from the lightweight audit. Each restored file has a 5-line STATUS comment block at the top documenting what to adapt before running. DEPRECATED.md is restructured to make the three categories explicit (removed / preserved-for-adaptation / active-migration), and adds an adaptation checklist that applies to all preserved scripts (replace corpus.json loading, verify SDK pins, update output paths, re-validate prompts, sample first). Validation: vault check --strict — 10,711 loaded, 0 invariant failures pytest — 74/74 ruff — clean	2026-05-03 07:50:28 -04:00
Vijay Janapa Reddi	56d3ed1551	chore(vault): remove 18 deprecated scripts per CORPUS_HARDENING_PLAN.md Phase 0 All 18 scripts pre-date the YAML-as-source-of-truth migration (ARCHITECTURE.md v2.x, Phase 1) and are listed in DEPRECATED.md's replaced-by table. The corpus.json they ran against is itself now a build artifact (gitignored, regenerated by `vault build --local-json`). Removed top-level (13): build_corpus.py → vault build (walks YAML, emits vault.db) export_to_staffml.py → vault build --local-json extract_taxonomy.py → vault/taxonomy.yaml deep_verify.py → audit_chains_with_gemini.py + validate_drafts.py gemini_*.py × 6 → Phase-7 vault generate / batched audit pipeline gpt_backfill_question.py gate.py → obsolete after schema v1.0 generate.py → vault generate Removed archive/ (5): expand_tracks.py, fill_zone_gaps.py, fill_gaps.sh, final_balance.sh, README.md (now-orphan). DEPRECATED.md updated: replaced-by table reorganized as a removal log for git-archaeology, with a note that historical implementations are findable via `git log --diff-filter=D`. Validation: vault check --strict — 10,711 loaded, 0 invariant failures pytest interviews/vault-cli/tests/ — 74/74 ruff check interviews/vault-cli — clean This is Phase 0 of CORPUS_HARDENING_PLAN.md.	2026-05-03 07:44:13 -04:00
Vijay Janapa Reddi	a74c98576e	Merge origin/dev into yaml-audit Sync the yaml-audit branch with the latest dev work since the previous sync (`5c5af75ed`). Brings in 73 commits including: - CI security fixes: postcss XSS bump, uuid bounds bump, codeql paths-ignore for vendored bundles, read-only token on staffml-validate-vault workflow - kits/ dark mode polish: code-block readability, dropdown contrast - vault-cli/: pre-commit ruff hook + 20 ruff fixes, all-contributors auto-credit workflow change to pull_request_target - dev's earlier merge of yaml-audit (`836d481b5`) carrying the pre-trailer-strip Phase 1/2/3 history; this merge harmonises that with the current trailer-clean yaml-audit tip - misc bug fixes (tinytorch perceptron seed, infra workflows, socratiq vite dev injector) Conflicts resolved (if any) preserve the yaml-audit-side authoritative state for vault/* files (we own those) and the dev-side authoritative state for .github/workflows/* and other shared infrastructure. # Conflicts: # .github/workflows/all-contributors-auto-credit.yml # .github/workflows/staffml-preview-dev.yml # interviews/staffml/src/data/corpus-summary.json # interviews/staffml/src/data/vault-manifest.json # interviews/staffml/tests/chain-and-vault-smoke.mjs # interviews/vault-cli/README.md # interviews/vault-cli/docs/CHAIN_ROADMAP.md # interviews/vault-cli/scripts/build_chains_with_gemini.py # interviews/vault-cli/scripts/generate_question_for_gap.py # interviews/vault-cli/scripts/merge_chain_passes.py # interviews/vault-cli/scripts/validate_drafts.py # interviews/vault-cli/src/vault_cli/legacy_export.py # interviews/vault-cli/tests/test_chain_validation.py # interviews/vault/.gitignore # interviews/vault/ARCHITECTURE.md # interviews/vault/chains.json # interviews/vault/id-registry.yaml # interviews/vault/questions/edge/optimization/edge-2536.yaml # interviews/vault/questions/mobile/deployment/mobile-2147.yaml # tinytorch/src/03_layers/03_layers.py	2026-05-02 11:06:43 -04:00
Vijay Janapa Reddi	924363e2b7	feat(vault): Phase 3 batch — 6 questions published + chain rebuild Second Phase 3 batch run (post-pre-filter and post-tightened-validator). 30 gaps fed in; 21 dropped by the gap pre-filter as hallucinated; 9 generated drafts; 6 cleared all gates and were published; 3 dropped on level_fit (level inflation pattern). Published (status=published, human_reviewed=verified by vj, all gates pass + audit_math pass): edge-2540 L4 edge/real-time-deadlines mobile-2151 L4 mobile/kv-cache-management mobile-2152 L2 mobile/kv-cache-management mobile-2154 L4 mobile/model-serving-infrastructure mobile-2157 L4 mobile/roofline-analysis mobile-2161 L5 mobile/power-budgeting Rejected (level_fit failures — Gemini stamped L3-L5 on questions whose cognitive demand is L1/L2; same failure mode the audit caught on the first pilot): edge-2537 edge/real-time-deadlines (level inflation) edge-2543 edge/transformer-systems-cost (level inflation + mixed base-2/base-10 conversions) mobile-2156 mobile/quantization-fundamentals (level inflation) Targeted chain rebuilds on the 5 affected buckets (5 parallel build_chains_with_gemini.py --bucket calls): edge/real-time-deadlines 7 chains → 9 mobile/kv-cache-management 4 chains → 6 mobile/model-serving-infrastructure 5 chains → 4 mobile/roofline-analysis 3 chains → 4 mobile/power-budgeting 2 chains → 6 21 dropped → 29 added net chain count: 835 → 843 (+8) 5 of 6 published questions land in clean primary chains: edge-2540 in [edge-0114(L1) → … → edge-2540(L4) → … → edge-0621(L6+)] mobile-2152 in [mobile-2152(L2) → mobile-1097(L3) → mobile-1185(L4)] mobile-2154 in [mobile-0244(L1) → mobile-0305(L2) → mobile-2154(L4) → mobile-0654(L6+)] mobile-2157 in [mobile-0364(L2) → mobile-0537(L3) → mobile-2157(L4) → mobile-0617(L5)] mobile-2161 in [mobile-0151(L2) → mobile-0103(L3) → mobile-0581(L4) → mobile-2161(L5) → mobile-1587(L6+)] mobile-2151 didn't enter a chain — Gemini chose other L4 candidates for that bucket; mobile-2152 covers the bridge work. Drive-by: 24 chain_ids renumbered to bucket-tagged form (<track>-chain-bucket-<topic-slug>-<NN>) to resolve collisions. build_chains_with_gemini.py's chain_id format uses call_idx, which restarts at 1 for each --bucket invocation — collides with the original full-corpus run's IDs and across parallel bucket runs. Filed as a follow-up to fix the generator (use a content-stable or bucket-tagged ID scheme). Verification trail (75 Gemini calls total this batch): pre-filter: 30 calls, 21 hallucinated, 9 real (70% hallucination — matches audit-2 measurement exactly) generation: 9 calls, 9/9 schema-valid audit_math: 9 calls, 9/9 pass (independent re-derivation of all napkin_math arithmetic; 4-way parallel via the new ThreadPoolExecutor in audit_math.py) validate_drafts: 27 calls (3 LLM judges × 9 drafts), 6/9 pass bucket rebuild: 5 calls, 5 strict-mode chain sets Validation: apply_proposed_chains.py --dry-run: clean (843 chains) vault check --strict: 10,709 loaded, 0 invariant failures vault build --local-json: published_count=9446, chainCount=843, releaseHash=5a4783e62d2ca8d…	2026-05-02 10:54:17 -04:00
Vijay Janapa Reddi	825d9571a6	chore: remove archived content and refresh contributor docs - Remove retired _archive/ and scripts/archive/ trees (site, book filters, games, vault); vault CHANGELOG points to git history for old scripts. - CONTRIBUTING: site project row, site/ in area map, root vs TinyTorch pre-commit, vault schema drift wording. - Newsletter CLI: path-agnostic news alias; tinytorch pre-commit comments; add tools/ and staffml-vault-types READMEs for maintainers.	2026-05-02 10:48:00 -04:00
Vijay Janapa Reddi	ac15ac2fd6	feat(vault): Phase 3.e — chain rebuild absorbing 2 new questions After publishing mobile-2147 and edge-2536 in `9ab6bb85d` (Phase 3.d disposition), re-ran the strict-mode chain build on the two affected buckets to absorb them into proper progressions. Targeted rebuild (2 Gemini calls, ~1 min wall time vs ~25 min for build_chains_with_gemini.py --all): build_chains_with_gemini.py --bucket mobile:model-format-conversion build_chains_with_gemini.py --bucket edge:pruning-sparsity Results: mobile/model-format-conversion: 2 secondary chains → 12 primary chains. Notable: mobile-2147 lands in a clean L1→L2→L3→L4→L5→L6+ chain (mobile-0984 → mobile-2147 → mobile-1022 → mobile-1511 → mobile-0980 → mobile-1662) — exactly the strict +1 progression the bridge was authored to enable. edge/pruning-sparsity: 3 secondary chains → 4 primary chains. Notable: edge-2536 lands in L1→L3→L4→L5 (edge-1784 → edge-1960 → edge-2536 → edge-1957) — slots between edge-1960 (L3) and edge-1957 (L5) as designed, turning a Δ=2 jump into Δ=1 + Δ=1. Both buckets transition from secondary-only to primary-only — strict mode produced clean +1/+2 chains with the new bridges in place. Net chain count: 824 → 835 (-5 old secondary, +16 new primary). Validation: apply_proposed_chains.py --dry-run on merged chains.json: clean vault check --strict: 10,703 loaded, 0 failures vault build --local-json: chainCount=835, releaseHash 9b381a55…	2026-05-02 09:47:54 -04:00
Vijay Janapa Reddi	9ab6bb85d0	feat(vault): Phase 3 pilot disposition — 2 published, 3 rejected Acting on the audit findings (independent Gemini audit, 2 runs converged on the same per-draft verdicts). Of the 5 drafts in the Phase 3 pilot: Published (status: published, human_reviewed: verified): mobile-2147 Model Format Conversion: Sizing the FP16 CoreML Payload Clean L2 / understand. FP32→FP16 storage halving on a 15M-param iOS model. Realistic App Store framing, correct math, no fabrication. edge-2536 Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU Canonical L4 / analyze lesson on dense systolic arrays + unstructured sparsity. Edited the scenario's baseline latency from 80ms → 15ms (more realistic for MobileNetV2 on Coral USB TPU; audit flagged the 80ms figure as unrealistic). Pedagogical content unchanged. Rejected (deleted): edge-2537 edge/tco-cost-modeling Audit (both runs) flagged "cognitive load too low for L3 — basic arithmetic word problem with all parameters given". Real L3 TCO questions require judgement under uncertainty; this one is L1/L2. mobile-2146 mobile/duty-cycling Audit flagged a physically absurd 0.5s wake-up at 4W for a mobile NPU (real NPUs wake in milliseconds). Run 2 additionally flagged the dashcam framing as broken (a dashcam idle 75% of the time would miss accidents). Premise is fiction; the lesson can't be salvaged. edge-2535 edge/latency-decomposition Failed validate_drafts.py originality gate at promotion (cosine 0.933 vs its own bridge anchor edge-1883). Was left as .yaml.draft pending review; content is fine on its own, but pedagogically duplicative with the lesson in the now-promoted edge-2536 (host-side bottleneck on Coral). Cleaner to drop than de-duplicate. The 4 ID entries in id-registry.yaml stay (append-only ledger); the removed YAMLs become dangling registry entries which is the intended behaviour — the registry is "every ID ever assigned", not "every ID currently active". Validation: vault check --strict: 10,703 loaded, 0 invariant failures vault build --local-json: 9440 published (was 9438 + 2), chainCount=824, releaseHash a9a601c2bf… (was 479811040b…)	2026-05-02 09:39:52 -04:00
Vijay Janapa Reddi	ed391afa74	fix(vault-cli): regenerate exemplar-gaps.yaml after authoring-tool question shifts The exemplar-coverage audit went stale after `84b1fab082` ("Phase 3.a + 3.b — gap-driven authoring tooling") added drafts and authoring scripts that shifted per-cell question counts. CI's StaffML Validate (Vault) job was failing the staleness check (`exemplar-gaps.yaml is stale; re-run audit and commit`). Regenerated by running: vault build --vault-dir interviews/vault --release-id regression-ci python3 interviews/vault-cli/scripts/exemplar_coverage_audit.py Total cells dropped 230 → 225 (cells_with_gap matches). Sample shifts match the CI failure log: tinyml/l5/specification 38→35, tinyml/l6+/design 35→37, tinyml/l6+/mastery 72→67, tinyml/l6+/optimization 6→4.	2026-05-02 09:15:45 -04:00
Vijay Janapa Reddi	2b3cf5e1da	chore(vault): consolidate AI pipeline artifacts under _pipeline/ Establishes one ignored subdirectory for ALL intermediate outputs of LLM-driven tooling (chain proposals, gap detection, draft scorecards, audit traces). Single gitignore rule: /_pipeline/. Convention is documented in interviews/vault/README.md under "Pipeline artifacts" — it's a real project layout convention, not AI-specific config. Path migration: interviews/vault/chains.proposed.json → _pipeline/chains.proposed.json interviews/vault/gaps.proposed.json → _pipeline/gaps.proposed.json interviews/vault/draft-validation-scorecard.json → _pipeline/draft-validation-scorecard.json interviews/vault/audit-runs/ → _pipeline/runs/ 8 scripts updated to define a PIPELINE_DIR constant and route default outputs through it: build_chains_with_gemini.py, apply_proposed_chains.py, merge_chain_passes.py, validate_drafts.py, audit_chains_with_gemini.py, generate_question_for_gap.py, summarize_proposed_chains.py, promote_drafts.py. Forward-looking docs (README.md chain-pipeline section + CHAIN_ROADMAP.md resume instructions + state snapshot) updated to reference the new paths. Historical Progress Log entries left as-is — they accurately describe what was committed at the time. Drive-by .gitignore fixes (both used full repo-relative paths under package-local .gitignore files, which never matched): interviews/vault-cli/.gitignore: scripts/.calibration_cache/ interviews/vault/.gitignore: /embeddings.npz Validation: - vault check --strict: 10,705 loaded, 0 invariant failures - pytest interviews/vault-cli/tests/: 74/74 - audit --dry-run: paths resolve correctly to _pipeline/runs/<ts>/ No durable corpus content moves. chains.json (live registry), id-registry.yaml, questions/, etc. all stay where they were.	2026-05-02 09:04:55 -04:00
Vijay Janapa Reddi	270b1a5bd2	fix(vault): drop 55 Δ=0 chains + remove Δ=0 from lenient mode Action on the strongest finding from the 2026-05-01 independent audit: 54 of 55 Δ=0 chains had no shared scenario (the "two questions sharing a scenario thread" constraint the lenient prompt was supposed to enforce). Two independent audit fields agreed (verdict=bad and shared_scenario=no), so this isn't a tuning question — the design choice was wrong. Why remove Δ=0 entirely rather than tighten the prompt: - The chain definition is "pedagogical progression through Bloom levels"; same-level edges contradict the definition. - The "shared scenario / different angle" carve-out is unenforceable by an LLM at corpus scale (audit confirmed). - Same-scenario same-level pairs are more honestly modeled as siblings of a chain anchor, not as chain members. Changes: - chains.json: 879 → 824. Dropped: 55 chains (all tier=secondary, since Δ=0 was only ever produced by the lenient sweep). Per-track: edge -19, tinyml -12, mobile -10, cloud -7, global -7. - build_chains_with_gemini.py: MODE_CONFIG["lenient"]["allowed_deltas"]: {0,1,2,3} → {1,2,3} LENIENT_PROMPT_TEMPLATE: Δ=0 paragraph rewritten to explicitly REJECT same-level pairs (with rationale citing the audit). docstring + --mode help text updated. - tests/test_chain_validation.py: test_lenient_accepts_same_level_pair → test_lenient_rejects_same_level_pair header docstring updated to reflect the new rule. - vault-manifest.json: chainCount 879 → 824, releaseHash rolls to 479811040b7a… (real content delta, not a timestamp churn). Validation: - vault check --strict: 10,705 loaded, 0 failures - vault build --local-json: chainCount=824, releaseHash=479811040b… - pytest: 74/74 - playwright chain-and-vault-smoke: 19/19 (fixtures cloud-0001 + cloud-0231 are still in their chains post-drop) Audit findings #2 (gap detection ~50% noise) and #3 (4 pilot drafts disposition) remain open — see CHAIN_ROADMAP.md Progress Log.	2026-05-02 08:51:49 -04:00
Vijay Janapa Reddi	b68f6dbf83	audit(vault): independent Gemini audit — 18 calls, 3 critical findings Ran audit_chains_with_gemini.py end-to-end. 18 Gemini-3.1-pro-preview calls (well under the 250/day cap) sized to 80-336K char prompts (the attention sweet spot at ~80-100K input tokens). Per-call traces under interviews/vault/audit-runs/20260501T213817Z/, rollup at interviews/vault/audit-runs/AUDIT_REPORT.md. Three critical findings the pipeline's own gates missed: 1. Δ=0 chains are ~98% bad (54/55 judged "bad", 54/55 judged "shared_scenario_for_d0_pair: no"). The lenient prompt's constraint that Δ=0 only fire for shared-scenario pairs didn't bind in practice. 6% of chains.json is affected. 2. Gap detection is ~50% noise. 21 of 40 sampled gaps judged "hallucinated" — anchors don't share a scenario thread. Phase 3 generation should pre-filter gaps before issuing the call. 3. Pilot draft pass rate was inflated by validate_drafts.py's LLM judges: mobile-2147 accept edge-2536 edit (scenario truncation) edge-2537 REJECT (cognitive load too low for L3) mobile-2146 REJECT (physically absurd 0.5s/4W NPU wake-up) Calibration findings: - Primary chains (n=100): 64% good, 22% weak, 14% bad - Secondary chains (n=100): 61% good, 33% weak, 6% bad - Tier delta vs primary is small at "good" — the actual quality cliff in secondary is concentrated in the Δ=0 subset. No autonomous fixes filed — per agreement, audit produces findings only. CHAIN_ROADMAP.md Progress Log spells out the three concrete decisions for next session (drop / demote / rebuild Δ=0; pre-filter gaps; disposition the 4 drafts per AUDIT_REPORT.md). Total Gemini calls this session: 55 (Phase 1.4 + Phase 3 pilot + audit).	2026-05-01 18:04:36 -04:00
Vijay Janapa Reddi	bc553017b4	docs(vault): roadmap status + Phase 3 authoring conventions D-cleanups folded into one commit: - CHAIN_ROADMAP.md status header reflects current state (Phase 1+2 complete, Phase 3 pilot landed, Phase 4 mostly shipped). - Phase 4.1 / 4.6 / 4.7 / 4.9 entries marked complete with commit refs. - ARCHITECTURE.md gains a §3.6.1 documenting the two YAML-body conventions introduced when LLM-authored questions started landing in Phase 3: - _authoring private metadata block on drafts (stripped at promotion) - gap-bridge:<from>-<to> tag added at promotion for traceability Neither is schema-enforced (Pydantic accepts extra); both are stable across the pipeline. No code changes.	2026-05-01 17:33:36 -04:00
Vijay Janapa Reddi	202397f594	Merge origin/dev into yaml-audit Pull in the dev work that landed since yaml-audit was last synced: - --legacy-json renamed to --local-json (`2b381bb949`) — script/doc updates needed below in this branch - CI workflow refactor (validate-dev / validate-vault now reusable) - all-contributors automation, gitignore tightening, codespell list - PR #1622 navbar URL rewrite for dev preview - PR #1619 clone-size refactor, #1618 milestone3 xor fix, #1617 perceptron seed, #1616 tito status M3 - Chapter 9 PDF layout refinement - assorted staffml/practice fixes (pickRandom deps, GitHub star gate) This merges the canonical dev state into yaml-audit so subsequent work continues on top of the freshest base. Conflicts in practice/page.tsx + corpus.ts + ARCHITECTURE.md resolved to keep both sides' additive changes (Phase 2 tier work + dev's later refactors).	2026-05-01 17:11:31 -04:00
Vijay Janapa Reddi	836d481b54	Merge branch 'yaml-audit' into dev (Phase 1 + 2 + 3 pilot + 4.8 docs) Brings the chain corpus growth + tier-aware UI work into dev: - Phase 1: chains 373 → 879 (second-pass coverage build, primary + secondary tier; bucket coverage 33% → 91%) - Phase 2: tier surfacing through schema → TypeScript → UI (primary chains default; secondary reachable via ?chain= URL with "alt path" badge); 17/17 playwright - Phase 3 pilot: 5 gap-driven generations, 4 promoted as drafts (status=draft pending human review). edge-2535 left as .yaml.draft (failed originality gate). - Phase 4.8: ARCHITECTURE.md §3.6 + README "Chain build pipeline" section documenting v1.1 sidecar + hierarchy + tier model. State at merge: - vault check --strict: 10,705 loaded (4 new drafts), 0 invariant failures - vault build --legacy-json: 9438 published, chainCount=879 (drafts excluded by status filter — releaseHash unchanged from Phase 1) - playwright chain-and-vault-smoke: 17/17 (last yaml-audit run) Phase 3.e (chain rebuild absorbing the new questions) gated on the human review of the 4 drafts. Runbook in CHAIN_ROADMAP.md.	2026-05-01 13:39:33 -04:00
Vijay Janapa Reddi	bf70e7686f	feat(vault): Phase 3 pilot — 5 gaps generated, 4 promoted as drafts Pilot run of the Phase 3 authoring tooling on a 5-gap subset (sized down from the roadmap's 30 to keep wall-time + Gemini-call budget reasonable for an unsupervised run). Pilot scope: Selected 5 high-value gaps from gaps.proposed.lenient.json — buckets with ≥4 published questions, biased toward low-density tracks. All 5 picks landed in edge/mobile. Phase 3.c — generate (5/5 written): edge-2535 edge/latency-decomposition L?→L3 edge-2536 edge/pruning-sparsity L?→L4 edge-2537 edge/tco-cost-modeling L?→L3 mobile-2146 mobile/duty-cycling L?→L3 mobile-2147 mobile/model-format-conversion L?→L2 Phase 3.b validation — 4/5 pass (80% — above roadmap's 60-75% target): edge-2535: FAIL on originality (cos=0.933 vs edge-1883, threshold 0.92) edge-2536: pass on all 4 gates edge-2537: pass on all 4 gates mobile-2146: pass on all 4 gates mobile-2147: pass on all 4 gates The originality gate correctly caught a draft that was too similar to one of its bridge anchors — exactly the failure mode it was designed for. Gates were run on schema (Pydantic), originality (BAAI/bge-small-en-v1.5 cosine vs in-bucket neighbours, threshold 0.92), level_fit (Gemini-judge against same-level exemplars), coherence (Gemini-judge), and bridge (Gemini-judge against the gap anchors). Phase 3.d — promotion (4 passing drafts): - .yaml.draft → .yaml rename - _authoring stripped; replaced with proper schema fields: provenance: llm-draft status: draft (NOT published — gating on human review) authors: [gemini-3.1-pro-preview] human_reviewed: { status: not-reviewed } tags: + gap-bridge:<from>-<to> - id-registry.yaml appended (append-only ledger preserved) - edge-2535.yaml.draft kept in place for the human reviewer's disposition (rewrite + retry vs delete) Validation post-promotion: - vault check --strict: 10,705 loaded (was 10,701; +4 ✓), 0 failures - vault build --legacy-json: released set unchanged (status=draft excluded by release-policy.yaml's published filter) — releaseHash and chainCount intentionally stable until human review flips status Phase 3.e (chain rebuild) deferred: drafts must clear human review and flip to status: published before they're eligible for chain membership. Runbook in CHAIN_ROADMAP.md Progress Log. Cost: 5 generation + 15 judge = 20 Gemini calls.	2026-05-01 13:38:18 -04:00
Vijay Janapa Reddi	9f83d3e8a6	feat(vault): Phase 3 pilot — 5 gaps generated, 4 promoted as drafts Pilot run of the Phase 3 authoring tooling on a 5-gap subset (sized down from the roadmap's 30 to keep wall-time + Gemini-call budget reasonable for an unsupervised run). Pilot scope: Selected 5 high-value gaps from gaps.proposed.lenient.json — buckets with ≥4 published questions, biased toward low-density tracks. All 5 picks landed in edge/mobile. Phase 3.c — generate (5/5 written): edge-2535 edge/latency-decomposition L?→L3 edge-2536 edge/pruning-sparsity L?→L4 edge-2537 edge/tco-cost-modeling L?→L3 mobile-2146 mobile/duty-cycling L?→L3 mobile-2147 mobile/model-format-conversion L?→L2 Phase 3.b validation — 4/5 pass (80% — above roadmap's 60-75% target): edge-2535: FAIL on originality (cos=0.933 vs edge-1883, threshold 0.92) edge-2536: pass on all 4 gates edge-2537: pass on all 4 gates mobile-2146: pass on all 4 gates mobile-2147: pass on all 4 gates The originality gate correctly caught a draft that was too similar to one of its bridge anchors — exactly the failure mode it was designed for. Gates were run on schema (Pydantic), originality (BAAI/bge-small-en-v1.5 cosine vs in-bucket neighbours, threshold 0.92), level_fit (Gemini-judge against same-level exemplars), coherence (Gemini-judge), and bridge (Gemini-judge against the gap anchors). Phase 3.d — promotion (4 passing drafts): - .yaml.draft → .yaml rename - _authoring stripped; replaced with proper schema fields: provenance: llm-draft status: draft (NOT published — gating on human review) authors: [gemini-3.1-pro-preview] human_reviewed: { status: not-reviewed } tags: + gap-bridge:<from>-<to> - id-registry.yaml appended (append-only ledger preserved) - edge-2535.yaml.draft kept in place for the human reviewer's disposition (rewrite + retry vs delete) Validation post-promotion: - vault check --strict: 10,705 loaded (was 10,701; +4 ✓), 0 failures - vault build --legacy-json: released set unchanged (status=draft excluded by release-policy.yaml's published filter) — releaseHash and chainCount intentionally stable until human review flips status Phase 3.e (chain rebuild) deferred: drafts must clear human review and flip to status: published before they're eligible for chain membership. Runbook in CHAIN_ROADMAP.md Progress Log. Cost: 5 generation + 15 judge = 20 Gemini calls.	2026-05-01 13:38:18 -04:00
Vijay Janapa Reddi	cbb28ebf26	docs(vault): document v1.1 sidecar + hierarchy + tier model Phase 4.8 of CHAIN_ROADMAP.md. ARCHITECTURE.md gains a new §3.6 capturing the three deltas that landed during the chain workstream — additive to v1, not replacements: - hierarchical question layout (`<track>/<area>/<id>.yaml`) - sidecar chain architecture (chains.json authoritative; YAML chains: field retired) - chain tier model (primary/secondary, default-primary on read) README.md updates: - status line: v1.1, points at CHAIN_ROADMAP.md and ARCHITECTURE.md §3.6 - new "Chain build pipeline" section with the diagnose / build / apply / merge invocations - layout listing reflects scripts/ and the actual src/ contents (was stuck on Phase 0 scaffolding shape) No code changes. The v1 release-pipeline invariants absorb the v1.1 deltas without modification (chains.json is a Merkle leaf; tier flows into that leaf transparently).	2026-04-30 20:26:09 -04:00
Vijay Janapa Reddi	519581c1c3	docs(vault): document v1.1 sidecar + hierarchy + tier model Phase 4.8 of CHAIN_ROADMAP.md. ARCHITECTURE.md gains a new §3.6 capturing the three deltas that landed during the chain workstream — additive to v1, not replacements: - hierarchical question layout (`<track>/<area>/<id>.yaml`) - sidecar chain architecture (chains.json authoritative; YAML chains: field retired) - chain tier model (primary/secondary, default-primary on read) README.md updates: - status line: v1.1, points at CHAIN_ROADMAP.md and ARCHITECTURE.md §3.6 - new "Chain build pipeline" section with the diagnose / build / apply / merge invocations - layout listing reflects scripts/ and the actual src/ contents (was stuck on Phase 0 scaffolding shape) No code changes. The v1 release-pipeline invariants absorb the v1.1 deltas without modification (chains.json is a Merkle leaf; tier flows into that leaf transparently).	2026-04-30 20:26:09 -04:00
Vijay Janapa Reddi	83fe0f7193	feat(vault): Phase 1 — second-pass chain coverage build (373 → 879) Diagnoses uncovered (track, topic) buckets and runs a relaxed Gemini sweep targeting them. New chains tier="secondary"; pre-existing chains backfilled tier="primary". Tools (Phases 1.1, 1.2/1.3, 1.5): - diagnose_chain_coverage.py: surface buckets with no chains (committed earlier on yaml-audit) - build_chains_with_gemini.py: --mode lenient adds Δ ∈ {0,1,2,3} (committed earlier on yaml-audit) - merge_chain_passes.py: merges primary + secondary, enforces the multi-membership cap (max 2 chains/qid; non-L1/L2 capped at 1) Sweep (Phase 1.4): - 17 Gemini-3.1-pro-preview calls, ~22 min wall time, 211 buckets - 506 chains accepted (above the 200-400 estimate), 269 new gaps - validator caught a few cross-bucket and Δ=4 hallucinations inline - Δ distribution: Δ=1 69.1%, Δ=2 21.1%, Δ=3 4.6%, Δ=0 5.2% (10.9% of chains contain at least one Δ=0 — within target band) - random spot-check of 5 Δ=0 chains: all share scenario threads (DMA, CMSIS-NN, on-device routing, PB-scale pipelines) Coverage gains (chains/topic before → after): - cloud 2.95 → 4.37 (242 + 116 secondary) - edge 0.64 → 2.59 ( 49 + 148 secondary) - mobile 0.74 → 2.56 ( 46 + 113 secondary) - tinyml 0.80 → 2.64 ( 36 + 83 secondary) - global 0.00 → 0.96 ( 0 + 46 secondary) Buckets with ≥1 chain: 102 / 313 (33%) → 285 / 313 (91%). Validation: - apply_proposed_chains.py --dry-run: validation clean (879 chains) - vault check --strict: 10,701 loaded, 0 invariant failures - vault build --legacy-json: chainCount 373 → 879, release_hash rolled to 04ee8a23… - playwright chain-and-vault-smoke.mjs: 13/13 pass Phase 1 complete. Next: Phase 2 (tier surfacing in staffml UI).	2026-04-30 20:12:27 -04:00
Vijay Janapa Reddi	9e6f87bbd4	feat(vault): Phase 1 — second-pass chain coverage build (373 → 879) Diagnoses uncovered (track, topic) buckets and runs a relaxed Gemini sweep targeting them. New chains tier="secondary"; pre-existing chains backfilled tier="primary". Tools (Phases 1.1, 1.2/1.3, 1.5): - diagnose_chain_coverage.py: surface buckets with no chains (committed earlier on yaml-audit) - build_chains_with_gemini.py: --mode lenient adds Δ ∈ {0,1,2,3} (committed earlier on yaml-audit) - merge_chain_passes.py: merges primary + secondary, enforces the multi-membership cap (max 2 chains/qid; non-L1/L2 capped at 1) Sweep (Phase 1.4): - 17 Gemini-3.1-pro-preview calls, ~22 min wall time, 211 buckets - 506 chains accepted (above the 200-400 estimate), 269 new gaps - validator caught a few cross-bucket and Δ=4 hallucinations inline - Δ distribution: Δ=1 69.1%, Δ=2 21.1%, Δ=3 4.6%, Δ=0 5.2% (10.9% of chains contain at least one Δ=0 — within target band) - random spot-check of 5 Δ=0 chains: all share scenario threads (DMA, CMSIS-NN, on-device routing, PB-scale pipelines) Coverage gains (chains/topic before → after): - cloud 2.95 → 4.37 (242 + 116 secondary) - edge 0.64 → 2.59 ( 49 + 148 secondary) - mobile 0.74 → 2.56 ( 46 + 113 secondary) - tinyml 0.80 → 2.64 ( 36 + 83 secondary) - global 0.00 → 0.96 ( 0 + 46 secondary) Buckets with ≥1 chain: 102 / 313 (33%) → 285 / 313 (91%). Validation: - apply_proposed_chains.py --dry-run: validation clean (879 chains) - vault check --strict: 10,701 loaded, 0 invariant failures - vault build --legacy-json: chainCount 373 → 879, release_hash rolled to 04ee8a23… - playwright chain-and-vault-smoke.mjs: 13/13 pass Phase 1 complete. Next: Phase 2 (tier surfacing in staffml UI).	2026-04-30 20:12:27 -04:00
Vijay Janapa Reddi	b289a5eb75	Merge branch 'yaml-audit' into dev Brings the vault chain rebuild + sidecar architecture work into dev: - Hierarchical question layout (interviews/vault/questions/<track>/<area>/<id>.yaml) completed in earlier dev merge; this branch adds the sidecar split - chains.json is now the authoritative chain registry; YAML chains: field stripped from all 10,701 question files - 373 chains rebuilt via Gemini 3.1 Pro Preview with strict progression rules (Δ ∈ {1,2}, single-track, single-topic, multi-membership cap=2) - 138 gaps surfaced into gaps.proposed.json for Phase 3 authoring - Tooling: build_chains_with_gemini.py, apply_proposed_chains.py, summarize_proposed_chains.py, diagnose_chain_coverage.py - CHAIN_ROADMAP.md captures the resumable Phase 1-4 plan State at merge: - vault check --strict: 10,701 loaded, 0 invariant failures - vault build --legacy-json: clean, releaseId=dev, 9438 published, 373 chains - playwright UI suite (last run on yaml-audit): 13/13 pass Phase 1.1 (diagnose_chain_coverage.py) shipped on yaml-audit; Phase 1.2-1.6 (lenient sweep, tier merge) still pending. See CHAIN_ROADMAP.md Progress Log for the resumable cursor.	2026-04-30 18:39:05 -04:00
Vijay Janapa Reddi	f527c230f3	Merge branch 'yaml-audit' into dev Brings the vault chain rebuild + sidecar architecture work into dev: - Hierarchical question layout (interviews/vault/questions/<track>/<area>/<id>.yaml) completed in earlier dev merge; this branch adds the sidecar split - chains.json is now the authoritative chain registry; YAML chains: field stripped from all 10,701 question files - 373 chains rebuilt via Gemini 3.1 Pro Preview with strict progression rules (Δ ∈ {1,2}, single-track, single-topic, multi-membership cap=2) - 138 gaps surfaced into gaps.proposed.json for Phase 3 authoring - Tooling: build_chains_with_gemini.py, apply_proposed_chains.py, summarize_proposed_chains.py, diagnose_chain_coverage.py - CHAIN_ROADMAP.md captures the resumable Phase 1-4 plan State at merge: - vault check --strict: 10,701 loaded, 0 invariant failures - vault build --legacy-json: clean, releaseId=dev, 9438 published, 373 chains - playwright UI suite (last run on yaml-audit): 13/13 pass Phase 1.1 (diagnose_chain_coverage.py) shipped on yaml-audit; Phase 1.2-1.6 (lenient sweep, tier merge) still pending. See CHAIN_ROADMAP.md Progress Log for the resumable cursor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:39:05 -04:00
Vijay Janapa Reddi	af5f25f543	feat(vault-cli): diagnose_chain_coverage.py — surface buckets needing chains Loads the published corpus (via vault_cli.policy — single source of truth) and chains.json, buckets by (track, topic), and emits chain-coverage.json with two cuts: - uncovered_buckets: ≥3 questions, 0 chains - under_covered_buckets: ≥6 questions, ≤1 chain Plus per-track summary + top-10 uncovered for quick read. Output is gitignored — regeneratable, fed to Phase 1.4's --buckets-from. Phase 1.1 of CHAIN_ROADMAP.md. See progress log for the run results (211 uncovered buckets, edge/mobile/tinyml chain density 0.6-0.8 vs cloud's 2.95, biggest miss is cloud:roofline-analysis at 144q/0 chains).	2026-04-30 18:15:59 -04:00
Vijay Janapa Reddi	3526176384	feat(vault-cli): diagnose_chain_coverage.py — surface buckets needing chains Loads the published corpus (via vault_cli.policy — single source of truth) and chains.json, buckets by (track, topic), and emits chain-coverage.json with two cuts: - uncovered_buckets: ≥3 questions, 0 chains - under_covered_buckets: ≥6 questions, ≤1 chain Plus per-track summary + top-10 uncovered for quick read. Output is gitignored — regeneratable, fed to Phase 1.4's --buckets-from. Phase 1.1 of CHAIN_ROADMAP.md. See progress log for the run results (211 uncovered buckets, edge/mobile/tinyml chain density 0.6-0.8 vs cloud's 2.95, biggest miss is cloud:roofline-analysis at 144q/0 chains). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:15:59 -04:00
Vijay Janapa Reddi	1ac7d4c564	feat(vault): rebuild chains.json via Gemini 3.1 Pro Preview — 373 curated chains Replaced the 726 author-curated chains with 373 LLM-curated chains generated bucket-by-bucket within (track, topic). Gemini was prompted with the strict-progression + multi-chain constraints we agreed on: - Δ ∈ {1, 2} between consecutive members (prefer +1) - Up to 2-chain membership only for L1/L2 anchors - Single-topic, 2-6 members, no Δ=0 same-level pairs - Validated structurally on apply — vault check --strict passes Sweep stats: - 44 calls to gemini-3.1-pro-preview (well under 250/day cap) - 313 (track, topic) buckets processed in ~80 minutes - 373 chains accepted (51% of legacy count, much higher per-chain quality after strict filter) - Level-Δ distribution: 949 strict +1 (93%), 73 +2 (7%) — 0 +0/+3+ - Chain sizes: 26 size-2, 141 size-3, 128 size-4, 60 size-5, 18 size-6 - 1,395 questions in chains (15% of corpus, vs ~20% before) - 54 of ~87 topics have at least 1 chain - 138 corpus gaps identified (gaps.proposed.json) — missing-rung questions that would complete chains; feeds future authoring pass Why fewer chains than before is fine: - Old chains had a long tail with cos<0.65 (worse than random same-bucket pairs). LLM curation rejects those. - We trade quantity for pedagogical coherence. - The 138 gaps capture what was implicit in old chains via questions-that-shouldnt-have-been-paired; we make it explicit. Files: - chains.json — applied (was backed up to chains.json.bak by apply_proposed_chains.py) - chains.proposed.json — kept for review/audit - gaps.proposed.json — authoring backlog - vault-manifest.json + corpus-summary.json — regenerated - corpus.json — gitignored (CI regenerates) Validation: vault check --strict 0 failures, vault build clean, playwright UI suite 13/13 pass.	2026-04-30 15:15:45 -04:00

1 2 3 4 5

205 Commits