Files
cs249r_book/tools/phase_d/f2_second_pass_manifest.json
Vijay Janapa Reddi 6b2b3e0542 feat(vault): Phase D + F — parallelism gap closure (+87 PASS items)
Closes the parallelism + global L4-L6+ gaps that have been open across
three prior pushes. All gates green: vault check, lint, doctor, codegen,
validate-vault, render. Bundle: 9,688 → 9,775 published.

PARALLELISM GAP — finally closed:
  tinyml/parallelism:  1 → 8
  mobile/parallelism:  0 → 6
  edge/parallelism:   13 → 18
  global/parallelism:  0 → 19
  cloud/parallelism:  326 (unchanged; was already dense)

Phase D — parallelism + global generation (87 PASS):
D.1 Hand-authored 72 parallelism cells (track × parallelism-topic ×
    zone × level for edge/mobile/tinyml at L4-L6+) + 10 global L4-L6+
    cells. Bypasses the analyzer's topic-priority ranking which never
    surfaced parallelism cells in the top-100. Saved to
    tools/phase_d/{parallelism_targets.txt,global_targets.txt}.
D.2 PARALLELISM_RULES prompt variant in gemini_cli_generate_questions.py
    + --prompt-variant {default,parallelism} CLI flag. Adds rules:
      - FORBID single-step bandwidth division ("payload / bandwidth")
      - REQUIRE concrete interconnect (NVLink/IB/PCIe/RoCE/LoRa/SPI/BLE
        appropriate to track)
      - REQUIRE quantified synchronization or pipeline-bubble cost
      - REQUIRE non-obvious failure mode in common_mistake
      - For tinyml: ground in real numbers (Cortex-M4 SPI 5-25 MHz,
        LoRa 5-50 kbps)
    + --targets-from <file> CLI flag for hand-authored target lists.
    + parse_target() now sets competency_area from TOPIC_TO_AREA
      mapping (was hardcoded to "cross-cutting").
D.3 Generator: 72/72 written, **0 validate-at-write failures**, 3 API
    calls (no retries needed). Judge: 58 PASS / 12 NEEDS_FIX / 2 DROP
    = **80.6% pass rate** (vs B.5's 51% on standard cells). PARALLELISM
    prompt + validate-at-write together drove the rate up by 30pts.
D.4 Spot-read: 16 stratified PASS items (ran out at 16, no cloud since
    D.1 skipped that track). 0% rejection rate, all show real topology
    + quantified sync cost + correct math.
D.5 Global generator: 10/10 written, 0 validate failures, 1 API call.
    Judge: 6 PASS / 3 NEEDS_FIX / 1 DROP = 60% pass rate. Filled
    global cells (global-0432..0441).
D.6 Promote, rebuild bundle, repair registry, update manifest.

Phase E.1 — retry-on-validation-fail in generator:
  Single retry with structured error context for validate-at-write
  rejections. Cap at 1 retry per batch. NOT triggered in this run
  (D.3 + D.5 had 0 failures), but in place for future runs that
  might face the iter-1/iter-3 zero-draft pattern from B.5.

Phase F — second-pass NEEDS_FIX/DROP rehab (23 PASS):
F.2 Spawned general-purpose fix-agent on 33 items (13 NEEDS_FIX + 20
    DROP from C.3's first re-judge). 33/33 rewritten with deeper
    revisions: visual-aligned reframings, math corrections, real
    track-specific toolchains (Hailo-8 DFC, TensorRT 8.6 calibrators,
    Cortex-X4 NEON SDOT vs Hexagon NPU), unrealistic-premise fixes
    (KV cache in NPU SRAM → tiered LPDDR5/TCM scheme).
F.1 Re-judge: 23 PASS / 4 NEEDS_FIX / 6 DROP = **69.7% pass rate** on
    items previously rated NEEDS_FIX or DROP. The fix-agent's deeper
    rewrites recovered 70% of the carry-forward queue.
F.3 Stratified spot-read of 16 PASS items (parallel-safe with F.1):
    0% rejection rate. Standout: tinyml-1817 correctly diagnoses 2x
    half-duplex UART penalty by comparing observed to theoretical Ring
    AllReduce time.

Cleanup:
- repair_registry.py: appended 87 new IDs (D.3 + D.5 + F.1 outputs).
- vault-manifest.json refreshed: 9,688 → 9,775; track + level
  distributions updated; contentHash dccd3073672c.

API budget: ~12 calls used of 70 allotted (3 D.3 gen + 3 D.3 judge
+ 1 D.5 gen + 1 D.5 judge + 2 F.1 judge + 1 sample = 11). Far under
budget thanks to validate-at-write driving 0 retry calls.

The corpus is StaffML-day-ready with the parallelism gap genuinely
closed for the first time. The remaining 13 NEEDS_FIX + 6 DROP from
F.1 are deferred to a future cleanup; they don't block release.
2026-04-25 18:31:58 -04:00

567 lines
18 KiB
JSON

[
{
"id": "edge-2357",
"track": "edge",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/edge/edge-2357.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Visual alt text focuses on internal fragmentation, which the question explicitly excludes."
],
"fix_suggestion": "Update visual alt text to contrast TLB entry count or page walk overhead instead of internal fragmentation."
},
{
"id": "edge-2364",
"track": "edge",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/edge/edge-2364.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Visual alt text incorrectly shows a saturated network switch, contradicting the scenario's conclusion that host memory is the bottleneck."
],
"fix_suggestion": "Change visual to show host-memory/L3 cache as the saturated bottleneck, not the network switch."
},
{
"id": "edge-2367",
"track": "edge",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/edge/edge-2367.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "ERROR"
},
"issues": [
"Visual alt text shows stages overlapping in a Gantt chart, but the solution explicitly states the multi-context execution is serial and cannot overlap."
],
"fix_suggestion": ""
},
{
"id": "edge-2390",
"track": "edge",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/edge/edge-2390.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "ERROR",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Scenario contains a template injection glitch ('4K @interviews/vault/questions/edge/edge-2460.yaml FPS')."
],
"fix_suggestion": ""
},
{
"id": "edge-2401",
"track": "edge",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/edge/edge-2401.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "WARN",
"visual_alignment": "N/A"
},
"issues": [
"Generic PTQ vs QAT question; lacks strong uniqueness specific to Hailo-8 hardware."
],
"fix_suggestion": "Make the scenario more specific to the Hailo-8's quantization toolchain rather than generic Min-Max vs Entropy."
},
{
"id": "edge-2402",
"track": "edge",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/edge/edge-2402.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "WARN",
"visual_alignment": "N/A"
},
"issues": [
"Too similar to edge-2370 (testing Jetson power modes vs bandwidth/compute requirements)."
],
"fix_suggestion": "Differentiate more clearly from edge-2370, perhaps focusing on a different resource constraint like thermal throttling."
},
{
"id": "edge-2406",
"track": "edge",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/edge/edge-2406.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Math error: Computes available FPS by dividing GOPs by utilization (26000/50/0.7 = 743) instead of multiplying by utilization (26000*0.7/50 = 364), fundamentally misunderstanding resource utilization."
],
"fix_suggestion": ""
},
{
"id": "edge-2416",
"track": "edge",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/edge/edge-2416.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "ERROR",
"visual_alignment": "N/A"
},
"issues": [
"Exact duplicate scenario of edge-2363, just scaled lambda and mu by 0.5."
],
"fix_suggestion": ""
},
{
"id": "edge-2424",
"track": "edge",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/edge/edge-2424.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "WARN",
"visual_alignment": "N/A"
},
"issues": [
"Standard QAT vs PTQ trade-off question that could apply to any hardware, lacks uniqueness."
],
"fix_suggestion": "Focus more specifically on the Orin deployment tooling (TensorRT) and how to apply PTQ with limited calibration data instead of generic QAT vs PTQ."
},
{
"id": "mobile-1870",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1870.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Math error: Claims K=12 is the minimum queue depth for P_K < 0.01, but K=11 yields P_11 = 0.0077, which is < 0.01."
],
"fix_suggestion": ""
},
{
"id": "mobile-1881",
"track": "mobile",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/mobile/mobile-1881.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Visual alt text includes an NPU, which is not part of the problem's described pipeline (Cloud -> Modem -> CPU -> UFS)."
],
"fix_suggestion": "Change visual alt text to show the actual pipeline: Cloud -> 5G Modem -> Crypto Core -> UFS Storage, omitting the NPU."
},
{
"id": "mobile-1891",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1891.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "ERROR"
},
"issues": [
"Visual alt text claims a 75ms duration, but the correct overlapped mathematical solution is 400ms."
],
"fix_suggestion": ""
},
{
"id": "mobile-1896",
"track": "mobile",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/mobile/mobile-1896.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Visual stacked bar chart is physically misleading for visualizing harmonic mean of bandwidth."
],
"fix_suggestion": "Change visual alt text to describe a time-based chart or throughput gauge rather than stacked bandwidth bars."
},
{
"id": "mobile-1897",
"track": "mobile",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/mobile/mobile-1897.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Visual description only specifies incoming edges, missing the downlink traffic which is critical to the scenario."
],
"fix_suggestion": "Update visual alt text to include bidirectional edges reflecting both uplink and downlink traffic."
},
{
"id": "mobile-1903",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1903.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "ERROR"
},
"issues": [
"The visual describes a sequential Gantt chart, which contradicts the core concept of a double-buffered overlapping pipeline."
],
"fix_suggestion": ""
},
{
"id": "mobile-1918",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1918.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Energy units are wrong. 5ms at 1W is 5 mJ, not 5.95 mWh. 1 mWh is 3600 mJ."
],
"fix_suggestion": ""
},
{
"id": "mobile-1929",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1929.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "WARN",
"scenario_realism": "PASS",
"uniqueness": "WARN",
"visual_alignment": "N/A"
},
"issues": [
"Math is too simple for an L5 evaluation level.",
"Scenario heavily duplicates the duty-cycling concept from mobile-1918."
],
"fix_suggestion": ""
},
{
"id": "mobile-1948",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1948.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "ERROR",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"It is physically impossible to place an LLM KV cache entirely in NPU-local SRAM on mobile SoCs due to extreme size constraints (hundreds of MBs vs <32MB SRAM)."
],
"fix_suggestion": ""
},
{
"id": "mobile-1949",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1949.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "WARN",
"uniqueness": "WARN",
"visual_alignment": "N/A"
},
"issues": [
"Highly generic template question lacking specific, hard constraints.",
"Duplicates standard CPU vs NPU marketing talking points without depth."
],
"fix_suggestion": ""
},
{
"id": "mobile-1982",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1982.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "ERROR"
},
"issues": [
"The visual alt text claims the queue drains over 300ms, heavily contradicting the mathematical solution of 2720ms."
],
"fix_suggestion": ""
},
{
"id": "mobile-1995",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-1995.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"The solution completely ignores the 800 us OS preemption cost and the 18% thermal throttling penalty introduced in the prompt."
],
"fix_suggestion": ""
},
{
"id": "mobile-2025",
"track": "mobile",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/mobile/mobile-2025.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "WARN",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"L6+ level is a massive mismatch for a simple parameter count capacity subtraction problem."
],
"fix_suggestion": "Lower the level to L3 or add advanced system-level architectural constraints to match L6+."
},
{
"id": "mobile-2028",
"track": "mobile",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/mobile/mobile-2028.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"The solution entirely fails to compute the budget for the mixed-precision fallback mode explicitly requested in the prompt."
],
"fix_suggestion": ""
},
{
"id": "tinyml-1562",
"track": "tinyml",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1562.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "WARN"
},
"issues": [
"Visual describes a 1000 MB/s link, ignoring the encoding/TLP overhead explicitly calculated in the math."
],
"fix_suggestion": "Update the visual alt text to reflect the effective 886 MB/s link capacity instead of the raw line rate."
},
{
"id": "tinyml-1634",
"track": "tinyml",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1634.yaml",
"criteria": {
"math_correct": "WARN",
"cell_fit": "PASS",
"scenario_realism": "WARN",
"uniqueness": "PASS",
"visual_alignment": "PASS"
},
"issues": [
"Flash write speed of 100KB in 5ms (~20MB/s) is unrealistically fast for a Cortex-M4",
"Multiple WARNs result in DROP"
],
"fix_suggestion": ""
},
{
"id": "tinyml-1652",
"track": "tinyml",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1652.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "PASS",
"scenario_realism": "WARN",
"uniqueness": "WARN",
"visual_alignment": "PASS"
},
"issues": [
"Ring AllReduce on 3 bare-metal Cortex-M4 nodes is a highly synthetic scenario",
"Duplicates standard GPU Ring AllReduce template (uniqueness warning)"
],
"fix_suggestion": ""
},
{
"id": "tinyml-1661",
"track": "tinyml",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1661.yaml",
"criteria": {
"math_correct": "WARN",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "PASS"
},
"issues": [
"Solution incorrectly assumes the capacitor can fully discharge to 0V; it should account for the MCU brownout voltage threshold (e.g., ~1.8V)"
],
"fix_suggestion": "Update the math to use E = 0.5 * C * (V_initial^2 - V_brownout^2) to reflect realistic usable energy."
},
{
"id": "tinyml-1681",
"track": "tinyml",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1681.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "WARN",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Level mismatch: solution provides a rough 1.2x heuristic instead of the explicit L6+ cycle-accurate cost model requested"
],
"fix_suggestion": "Upgrade the solution's mathematical complexity to match L6+ cycle-accurate modeling expectations, or downgrade the level."
},
{
"id": "tinyml-1716",
"track": "tinyml",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1716.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Solution completely ignores the 8% DMA contention factor it was explicitly asked to account for"
],
"fix_suggestion": ""
},
{
"id": "tinyml-1721",
"track": "tinyml",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1721.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Solution fails to include the 3uA leakage in the calculation and omits the battery lifetime estimation entirely"
],
"fix_suggestion": ""
},
{
"id": "tinyml-1723",
"track": "tinyml",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1723.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Math merely adds single buffer sizes (14.5KB) instead of calculating the SRAM footprint for a true overlapped buffering scheme"
],
"fix_suggestion": ""
},
{
"id": "tinyml-1724",
"track": "tinyml",
"verdict_at_c3": "DROP",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1724.yaml",
"criteria": {
"math_correct": "ERROR",
"cell_fit": "PASS",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Solution ignores half the prompt and completely fails to answer which retention mode wins"
],
"fix_suggestion": ""
},
{
"id": "tinyml-1732",
"track": "tinyml",
"verdict_at_c3": "NEEDS_FIX",
"yaml_path": "interviews/vault/questions/tinyml/tinyml-1732.yaml",
"criteria": {
"math_correct": "PASS",
"cell_fit": "WARN",
"scenario_realism": "PASS",
"uniqueness": "PASS",
"visual_alignment": "N/A"
},
"issues": [
"Solution fails to address the specific L2 mechanics of half-transfer interrupts and pointer swapping asked in the prompt"
],
"fix_suggestion": "Update the solution to explicitly explain the half-transfer and full-transfer interrupts and the pointer swap logic requested."
}
]