cs249r_book

github-starred/cs249r_book

Fork 0

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 02:03:55 -05:00

Commit Graph

Author	SHA1	Message	Date
Vijay Janapa Reddi	20ea20005c	feat(vault): release-readiness final pass — E.2 + E.3 + F.4/F.5 + CHANGELOG Closes the release-readiness push. All 8 gates green: vault check, lint, doctor, codegen, validate-vault, render, tsc, Playwright. Bundle: 9,775 → 9,781 published. E.2 — Auto-emit vault-manifest.json from `vault build --legacy-json`: Added `emit_manifest()` to `legacy_export.py` and wired it into `commands/build.py` after the legacy corpus emission. The manifest is now derived deterministically from the same `loaded` set that produced corpus.json — track + level distributions, contentHash, counts. Eliminates the recurring stale-manifest pre-commit failure that had to be patched by hand twice during this push. E.3 — `--include-areas` flag in analyze_coverage_gaps.py: Injects forced area-targeted cells into the recommended_plan for each listed competency_area (parallelism, networking, etc.). For each (track, area) where area is in the include list, adds 1 cell per (canonical-topic × {L4, L5, L6+}) zone. Closes the structural mismatch where topic-priority ranking misses area-level gaps. Tested with `--include-areas parallelism`: plan now includes 21 parallelism-topic cells (was 0 in stock plan). F.4 — Third-pass fix-agent on 10 residuals (4 NEEDS_FIX + 6 DROP from F.1). Substantial rewrites; 0 archived. Major math corrections: - mobile-1948: KV cache reconstructed (96 MB / 2048 = 48 KB/token) - tinyml-1681: cycle-model with proper register spill (5912 → 7912) - tinyml-1716: serialization on single-core M4 (12 ms not 10 ms) - tinyml-1634: Young/Daly hours-conversion (139 s, not 2.31 s) - tinyml-1723: triple-buffer SRAM (43.5 KB → 19.5 KB) - edge-2401: log2(18) = 4.17 (was 3.6) F.5 — Re-judge: 6 PASS / 2 NEEDS_FIX / 2 DROP (60% pass rate). 6 more promoted. The 2 still-NEEDS_FIX + 2 DROP after THREE rewrite passes are documented as genuinely-stubborn carry-forwards. G.1 — Cloud parallelism spot-check: 12 stratified items reviewed, 0 issues. Cloud's 326 parallelism items are still high-quality. G.2 — CHANGELOG.md updated with comprehensive [0.1.2-dev] entry: schema changes, new validators, tooling additions, content additions, three documented lessons (validate-at-data-boundary, prompt-specificity-beats-budget, topic-priority-misses-area-gaps). Cumulative recovery rate of NEEDS_FIX/DROP items via layered fix- agents (Phase C + F.2 + F.4): 63 of 120 = 53%. The remaining 57 split between DROP (genuinely unrecoverable) and items still in NEEDS_FIX state (deferred to future passes). Final cumulative state of branch: - Bundle: 9,224 → 9,781 published (+557 net) - Lint warnings: 1,308+ → 0 - Doctor fails: 1 → 0 - Pydantic validators: 1 → 4 - Playwright tests: 8 → 9 - Repair scripts: 0 → 5 - Generator features: basic → bloom-aware + topic-area mapping + parallelism prompt + retry-on-validate-fail + targets-from + validate-at-write - Build pipeline: manual manifest → auto-emit - Analyzer: topic-priority only → topic-priority + area-include flag - Parallelism gap (the original mission): closed across all tracks	2026-04-25 18:55:31 -04:00
Vijay Janapa Reddi	6b2b3e0542	feat(vault): Phase D + F — parallelism gap closure (+87 PASS items) Closes the parallelism + global L4-L6+ gaps that have been open across three prior pushes. All gates green: vault check, lint, doctor, codegen, validate-vault, render. Bundle: 9,688 → 9,775 published. PARALLELISM GAP — finally closed: tinyml/parallelism: 1 → 8 mobile/parallelism: 0 → 6 edge/parallelism: 13 → 18 global/parallelism: 0 → 19 cloud/parallelism: 326 (unchanged; was already dense) Phase D — parallelism + global generation (87 PASS): D.1 Hand-authored 72 parallelism cells (track × parallelism-topic × zone × level for edge/mobile/tinyml at L4-L6+) + 10 global L4-L6+ cells. Bypasses the analyzer's topic-priority ranking which never surfaced parallelism cells in the top-100. Saved to tools/phase_d/{parallelism_targets.txt,global_targets.txt}. D.2 PARALLELISM_RULES prompt variant in gemini_cli_generate_questions.py + --prompt-variant {default,parallelism} CLI flag. Adds rules: - FORBID single-step bandwidth division ("payload / bandwidth") - REQUIRE concrete interconnect (NVLink/IB/PCIe/RoCE/LoRa/SPI/BLE appropriate to track) - REQUIRE quantified synchronization or pipeline-bubble cost - REQUIRE non-obvious failure mode in common_mistake - For tinyml: ground in real numbers (Cortex-M4 SPI 5-25 MHz, LoRa 5-50 kbps) + --targets-from <file> CLI flag for hand-authored target lists. + parse_target() now sets competency_area from TOPIC_TO_AREA mapping (was hardcoded to "cross-cutting"). D.3 Generator: 72/72 written, 0 validate-at-write failures, 3 API calls (no retries needed). Judge: 58 PASS / 12 NEEDS_FIX / 2 DROP = 80.6% pass rate (vs B.5's 51% on standard cells). PARALLELISM prompt + validate-at-write together drove the rate up by 30pts. D.4 Spot-read: 16 stratified PASS items (ran out at 16, no cloud since D.1 skipped that track). 0% rejection rate, all show real topology + quantified sync cost + correct math. D.5 Global generator: 10/10 written, 0 validate failures, 1 API call. Judge: 6 PASS / 3 NEEDS_FIX / 1 DROP = 60% pass rate. Filled global cells (global-0432..0441). D.6 Promote, rebuild bundle, repair registry, update manifest. Phase E.1 — retry-on-validation-fail in generator: Single retry with structured error context for validate-at-write rejections. Cap at 1 retry per batch. NOT triggered in this run (D.3 + D.5 had 0 failures), but in place for future runs that might face the iter-1/iter-3 zero-draft pattern from B.5. Phase F — second-pass NEEDS_FIX/DROP rehab (23 PASS): F.2 Spawned general-purpose fix-agent on 33 items (13 NEEDS_FIX + 20 DROP from C.3's first re-judge). 33/33 rewritten with deeper revisions: visual-aligned reframings, math corrections, real track-specific toolchains (Hailo-8 DFC, TensorRT 8.6 calibrators, Cortex-X4 NEON SDOT vs Hexagon NPU), unrealistic-premise fixes (KV cache in NPU SRAM → tiered LPDDR5/TCM scheme). F.1 Re-judge: 23 PASS / 4 NEEDS_FIX / 6 DROP = 69.7% pass rate on items previously rated NEEDS_FIX or DROP. The fix-agent's deeper rewrites recovered 70% of the carry-forward queue. F.3 Stratified spot-read of 16 PASS items (parallel-safe with F.1): 0% rejection rate. Standout: tinyml-1817 correctly diagnoses 2x half-duplex UART penalty by comparing observed to theoretical Ring AllReduce time. Cleanup: - repair_registry.py: appended 87 new IDs (D.3 + D.5 + F.1 outputs). - vault-manifest.json refreshed: 9,688 → 9,775; track + level distributions updated; contentHash dccd3073672c. API budget: ~12 calls used of 70 allotted (3 D.3 gen + 3 D.3 judge + 1 D.5 gen + 1 D.5 judge + 2 F.1 judge + 1 sample = 11). Far under budget thanks to validate-at-write driving 0 retry calls. The corpus is StaffML-day-ready with the parallelism gap genuinely closed for the first time. The remaining 13 NEEDS_FIX + 6 DROP from F.1 are deferred to a future cleanup; they don't block release.	2026-04-25 18:31:58 -04:00

Author

SHA1

Message

Date

Vijay Janapa Reddi

20ea20005c

feat(vault): release-readiness final pass — E.2 + E.3 + F.4/F.5 + CHANGELOG

Closes the release-readiness push. All 8 gates green: vault check,
lint, doctor, codegen, validate-vault, render, tsc, Playwright.
Bundle: 9,775 → 9,781 published.

E.2 — Auto-emit vault-manifest.json from `vault build --legacy-json`:
    Added `emit_manifest()` to `legacy_export.py` and wired it into
    `commands/build.py` after the legacy corpus emission. The manifest
    is now derived deterministically from the same `loaded` set that
    produced corpus.json — track + level distributions, contentHash,
    counts. Eliminates the recurring stale-manifest pre-commit failure
    that had to be patched by hand twice during this push.

E.3 — `--include-areas` flag in analyze_coverage_gaps.py:
    Injects forced area-targeted cells into the recommended_plan for
    each listed competency_area (parallelism, networking, etc.). For
    each (track, area) where area is in the include list, adds 1 cell
    per (canonical-topic × {L4, L5, L6+}) zone. Closes the structural
    mismatch where topic-priority ranking misses area-level gaps.
    Tested with `--include-areas parallelism`: plan now includes 21
    parallelism-topic cells (was 0 in stock plan).

F.4 — Third-pass fix-agent on 10 residuals (4 NEEDS_FIX + 6 DROP from
    F.1). Substantial rewrites; 0 archived. Major math corrections:
    - mobile-1948: KV cache reconstructed (96 MB / 2048 = 48 KB/token)
    - tinyml-1681: cycle-model with proper register spill (5912 → 7912)
    - tinyml-1716: serialization on single-core M4 (12 ms not 10 ms)
    - tinyml-1634: Young/Daly hours-conversion (139 s, not 2.31 s)
    - tinyml-1723: triple-buffer SRAM (43.5 KB → 19.5 KB)
    - edge-2401: log2(18) = 4.17 (was 3.6)

F.5 — Re-judge: 6 PASS / 2 NEEDS_FIX / 2 DROP (60% pass rate). 6 more
    promoted. The 2 still-NEEDS_FIX + 2 DROP after THREE rewrite
    passes are documented as genuinely-stubborn carry-forwards.

G.1 — Cloud parallelism spot-check: 12 stratified items reviewed,
    0 issues. Cloud's 326 parallelism items are still high-quality.

G.2 — CHANGELOG.md updated with comprehensive [0.1.2-dev] entry:
    schema changes, new validators, tooling additions, content
    additions, three documented lessons (validate-at-data-boundary,
    prompt-specificity-beats-budget, topic-priority-misses-area-gaps).

Cumulative recovery rate of NEEDS_FIX/DROP items via layered fix-
agents (Phase C + F.2 + F.4): 63 of 120 = 53%. The remaining 57 split
between DROP (genuinely unrecoverable) and items still in NEEDS_FIX
state (deferred to future passes).

Final cumulative state of branch:
- Bundle: 9,224 → 9,781 published (+557 net)
- Lint warnings: 1,308+ → 0
- Doctor fails: 1 → 0
- Pydantic validators: 1 → 4
- Playwright tests: 8 → 9
- Repair scripts: 0 → 5
- Generator features: basic → bloom-aware + topic-area mapping +
  parallelism prompt + retry-on-validate-fail + targets-from +
  validate-at-write
- Build pipeline: manual manifest → auto-emit
- Analyzer: topic-priority only → topic-priority + area-include flag
- Parallelism gap (the original mission): closed across all tracks

2026-04-25 18:55:31 -04:00

Vijay Janapa Reddi

6b2b3e0542

feat(vault): Phase D + F — parallelism gap closure (+87 PASS items)

Closes the parallelism + global L4-L6+ gaps that have been open across
three prior pushes. All gates green: vault check, lint, doctor, codegen,
validate-vault, render. Bundle: 9,688 → 9,775 published.

PARALLELISM GAP — finally closed:
  tinyml/parallelism:  1 → 8
  mobile/parallelism:  0 → 6
  edge/parallelism:   13 → 18
  global/parallelism:  0 → 19
  cloud/parallelism:  326 (unchanged; was already dense)

Phase D — parallelism + global generation (87 PASS):
D.1 Hand-authored 72 parallelism cells (track × parallelism-topic ×
    zone × level for edge/mobile/tinyml at L4-L6+) + 10 global L4-L6+
    cells. Bypasses the analyzer's topic-priority ranking which never
    surfaced parallelism cells in the top-100. Saved to
    tools/phase_d/{parallelism_targets.txt,global_targets.txt}.
D.2 PARALLELISM_RULES prompt variant in gemini_cli_generate_questions.py
    + --prompt-variant {default,parallelism} CLI flag. Adds rules:
      - FORBID single-step bandwidth division ("payload / bandwidth")
      - REQUIRE concrete interconnect (NVLink/IB/PCIe/RoCE/LoRa/SPI/BLE
        appropriate to track)
      - REQUIRE quantified synchronization or pipeline-bubble cost
      - REQUIRE non-obvious failure mode in common_mistake
      - For tinyml: ground in real numbers (Cortex-M4 SPI 5-25 MHz,
        LoRa 5-50 kbps)
    + --targets-from <file> CLI flag for hand-authored target lists.
    + parse_target() now sets competency_area from TOPIC_TO_AREA
      mapping (was hardcoded to "cross-cutting").
D.3 Generator: 72/72 written, **0 validate-at-write failures**, 3 API
    calls (no retries needed). Judge: 58 PASS / 12 NEEDS_FIX / 2 DROP
    = **80.6% pass rate** (vs B.5's 51% on standard cells). PARALLELISM
    prompt + validate-at-write together drove the rate up by 30pts.
D.4 Spot-read: 16 stratified PASS items (ran out at 16, no cloud since
    D.1 skipped that track). 0% rejection rate, all show real topology
    + quantified sync cost + correct math.
D.5 Global generator: 10/10 written, 0 validate failures, 1 API call.
    Judge: 6 PASS / 3 NEEDS_FIX / 1 DROP = 60% pass rate. Filled
    global cells (global-0432..0441).
D.6 Promote, rebuild bundle, repair registry, update manifest.

Phase E.1 — retry-on-validation-fail in generator:
  Single retry with structured error context for validate-at-write
  rejections. Cap at 1 retry per batch. NOT triggered in this run
  (D.3 + D.5 had 0 failures), but in place for future runs that
  might face the iter-1/iter-3 zero-draft pattern from B.5.

Phase F — second-pass NEEDS_FIX/DROP rehab (23 PASS):
F.2 Spawned general-purpose fix-agent on 33 items (13 NEEDS_FIX + 20
    DROP from C.3's first re-judge). 33/33 rewritten with deeper
    revisions: visual-aligned reframings, math corrections, real
    track-specific toolchains (Hailo-8 DFC, TensorRT 8.6 calibrators,
    Cortex-X4 NEON SDOT vs Hexagon NPU), unrealistic-premise fixes
    (KV cache in NPU SRAM → tiered LPDDR5/TCM scheme).
F.1 Re-judge: 23 PASS / 4 NEEDS_FIX / 6 DROP = **69.7% pass rate** on
    items previously rated NEEDS_FIX or DROP. The fix-agent's deeper
    rewrites recovered 70% of the carry-forward queue.
F.3 Stratified spot-read of 16 PASS items (parallel-safe with F.1):
    0% rejection rate. Standout: tinyml-1817 correctly diagnoses 2x
    half-duplex UART penalty by comparing observed to theoretical Ring
    AllReduce time.

Cleanup:
- repair_registry.py: appended 87 new IDs (D.3 + D.5 + F.1 outputs).
- vault-manifest.json refreshed: 9,688 → 9,775; track + level
  distributions updated; contentHash dccd3073672c.

API budget: ~12 calls used of 70 allotted (3 D.3 gen + 3 D.3 judge
+ 1 D.5 gen + 1 D.5 judge + 2 F.1 judge + 1 sample = 11). Far under
budget thanks to validate-at-write driving 0 retry calls.

The corpus is StaffML-day-ready with the parallelism gap genuinely
closed for the first time. The remaining 13 NEEDS_FIX + 6 DROP from
F.1 are deferred to a future cleanup; they don't block release.

2026-04-25 18:31:58 -04:00

2 Commits