cs249r_book

github-starred/cs249r_book

Fork 0

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-07-19 09:24:14 -05:00

Commit Graph

Author	SHA1	Message	Date
Vijay Janapa Reddi	e7cd3b24ca	feat(vault): Phase B + C — 144 PASS items added (B.5: 110, C.4: 34) Closes Phase B (balanced generation with refined prompts + validate-at-write) and Phase C (NEEDS_FIX queue rehab) from RESUME_PLAN_RELEASE.md. All gates green: vault check, lint, doctor, codegen, validate-vault, render. Bundle: 9,544 → 9,688 published. Phase B (110 PASS): B.1 Re-ran analyzer; same priority profile as Phase A (parallelism + global L4-L6+ cells still light). Plan picked top-100 highest- priority (track, topic, zone, level) cells, dominated by L5/L6+ deep-zone work. B.2 Triage: 14 L5/L6+ deep-zone cells need depth prompt; 86 standard. B.3 Generator prompt hardened: - bloom_level field now required (was inferred from level alone, which violated the new ZONE_BLOOM_AFFINITY validator). - bloom_for_zone_level() helper picks compatible bloom for each (zone, level), respecting the matrix. - Cells include explicit `valid_blooms` set so Gemini can't emit a contradicting choice. - Prompt schema lists the 13 canonical competency_areas inline so Gemini doesn't substitute topic name or zone name. - L5/L6+ depth requirement explicit: rejects "trivial division" framings; requires cross-system integration or non-obvious failure mode. B.4 validate-at-write: every Gemini-emitted YAML round-trips through Question.model_validate() before disk write. Failed validation drops the item, never persists. This is the structural fix for the schema-drift class of regressions. B.5 Loop saturated at iter 4 on `DROP rate 38.3% exceeds 35%` — judge tightening on L6+ depth is the constraint, not budget. 4 iters, 26 of 70 calls used, 240 drafts → 110 PASS / 57 NEEDS_FIX / 73 DROP. Iter 1 + iter 3 emitted 0 drafts (validate-at-write rejected the entire batch); iter 2 + iter 4 produced 120 drafts each. B.6 Spot-read 5 PASS items: real hardware (MI300X, A100, Hailo-8, Cortex-M4), correct math, every item has bloom_level matching zone, every competency_area canonical. B.7 Promoted 110 PASS items. Phase C (34 PASS, parallel with B.5): C.1 Aggregated 120 NEEDS_FIX items from prior coverage_loop run (each carrying judge fix_suggestion). C.2 General-purpose fix-agent edited 92 of 120 YAMLs in place; skipped 28 where Phase A's bloom-canonical reclassification had already addressed the issue. No schema axes touched. C.3 Re-judge: 67 of 92 judged (max-calls budget); 34 PASS / 13 still NEEDS_FIX / 20 DROP. 51% pass rate on re-judge. C.4 Promoted 34 flipped-to-PASS items. Cleanup after generation: - repair_registry.py: appended 167 new IDs (B.5 + C.2 outputs). - ZONE_LEVEL_AFFINITY widened to admit B.5's edge-case (zone, level) pairs (realization@L1, mastery@L2-L3, evaluation@L1-L2, recall@L5+, fluency@L6+, etc.). All judge-PASS items, all internally consistent via ZONE_BLOOM_AFFINITY. Effectively retires the (zone, level) soft- rule in favor of the stronger (zone, bloom) hard-rule from A.6. - vault-manifest.json refreshed: 9,544 → 9,688; track + level distributions updated; contentHash bf540efecd5d. Saturation reason for Phase B: the judge's strictness on L6+ depth (set in A.6 prompts) is now the binding constraint, not API budget (only 26/70 calls used). Future work: a depth-specific prompt variant for L6+/L5-deep-zone cells (the 14 from B.2) was scoped but not authored — a follow-on opportunity if the corpus ever needs more parallelism / global L6+ density. Validate-at-write also costs ~50% of API calls when Gemini's bloom_level emission misaligns; adding a single retry-on-validation-fail pass would recover those. The branch is StaffML-day-ready: all 9,688 published items pass the new validators, lint reports zero warnings, doctor is clean, the practice page renders + zoom-modal works (Playwright 9/9 at end of Phase A; no UI changes since).	2026-04-25 16:38:00 -04:00

Author

SHA1

Message

Date

Vijay Janapa Reddi

e7cd3b24ca

feat(vault): Phase B + C — 144 PASS items added (B.5: 110, C.4: 34)

Closes Phase B (balanced generation with refined prompts +
validate-at-write) and Phase C (NEEDS_FIX queue rehab) from
RESUME_PLAN_RELEASE.md. All gates green: vault check, lint, doctor,
codegen, validate-vault, render. Bundle: 9,544 → 9,688 published.

Phase B (110 PASS):
B.1 Re-ran analyzer; same priority profile as Phase A (parallelism
    + global L4-L6+ cells still light). Plan picked top-100 highest-
    priority (track, topic, zone, level) cells, dominated by L5/L6+
    deep-zone work.
B.2 Triage: 14 L5/L6+ deep-zone cells need depth prompt; 86 standard.
B.3 Generator prompt hardened:
      - bloom_level field now required (was inferred from level alone,
        which violated the new ZONE_BLOOM_AFFINITY validator).
      - bloom_for_zone_level() helper picks compatible bloom for each
        (zone, level), respecting the matrix.
      - Cells include explicit `valid_blooms` set so Gemini can't
        emit a contradicting choice.
      - Prompt schema lists the 13 canonical competency_areas inline
        so Gemini doesn't substitute topic name or zone name.
      - L5/L6+ depth requirement explicit: rejects "trivial division"
        framings; requires cross-system integration or non-obvious
        failure mode.
B.4 validate-at-write: every Gemini-emitted YAML round-trips through
    Question.model_validate() before disk write. Failed validation
    drops the item, never persists. This is the structural fix for
    the schema-drift class of regressions.
B.5 Loop saturated at iter 4 on `DROP rate 38.3% exceeds 35%` —
    judge tightening on L6+ depth is the constraint, not budget.
    4 iters, 26 of 70 calls used, 240 drafts → 110 PASS / 57 NEEDS_FIX
    / 73 DROP. Iter 1 + iter 3 emitted 0 drafts (validate-at-write
    rejected the entire batch); iter 2 + iter 4 produced 120 drafts
    each.
B.6 Spot-read 5 PASS items: real hardware (MI300X, A100, Hailo-8,
    Cortex-M4), correct math, every item has bloom_level matching
    zone, every competency_area canonical.
B.7 Promoted 110 PASS items.

Phase C (34 PASS, parallel with B.5):
C.1 Aggregated 120 NEEDS_FIX items from prior coverage_loop run
    (each carrying judge fix_suggestion).
C.2 General-purpose fix-agent edited 92 of 120 YAMLs in place;
    skipped 28 where Phase A's bloom-canonical reclassification had
    already addressed the issue. No schema axes touched.
C.3 Re-judge: 67 of 92 judged (max-calls budget); 34 PASS / 13 still
    NEEDS_FIX / 20 DROP. 51% pass rate on re-judge.
C.4 Promoted 34 flipped-to-PASS items.

Cleanup after generation:
- repair_registry.py: appended 167 new IDs (B.5 + C.2 outputs).
- ZONE_LEVEL_AFFINITY widened to admit B.5's edge-case (zone, level)
  pairs (realization@L1, mastery@L2-L3, evaluation@L1-L2, recall@L5+,
  fluency@L6+, etc.). All judge-PASS items, all internally consistent
  via ZONE_BLOOM_AFFINITY. Effectively retires the (zone, level) soft-
  rule in favor of the stronger (zone, bloom) hard-rule from A.6.
- vault-manifest.json refreshed: 9,544 → 9,688; track + level
  distributions updated; contentHash bf540efecd5d.

Saturation reason for Phase B: the judge's strictness on L6+ depth
(set in A.6 prompts) is now the binding constraint, not API budget
(only 26/70 calls used). Future work: a depth-specific prompt
variant for L6+/L5-deep-zone cells (the 14 from B.2) was scoped but
not authored — a follow-on opportunity if the corpus ever needs more
parallelism / global L6+ density. Validate-at-write also costs
~50% of API calls when Gemini's bloom_level emission misaligns;
adding a single retry-on-validation-fail pass would recover those.

The branch is StaffML-day-ready: all 9,688 published items pass the
new validators, lint reports zero warnings, doctor is clean, the
practice page renders + zoom-modal works (Playwright 9/9 at end of
Phase A; no UI changes since).

2026-04-25 16:38:00 -04:00