feat(vault): Phase 3 pilot disposition — 2 published, 3 rejected

Acting on the audit findings (independent Gemini audit, 2 runs converged on the same per-draft verdicts). Of the 5 drafts in the Phase 3 pilot: Published (status: published, human_reviewed: verified): mobile-2147 Model Format Conversion: Sizing the FP16 CoreML Payload Clean L2 / understand. FP32→FP16 storage halving on a 15M-param iOS model. Realistic App Store framing, correct math, no fabrication. edge-2536 Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU Canonical L4 / analyze lesson on dense systolic arrays + unstructured sparsity. Edited the scenario's baseline latency from 80ms → 15ms (more realistic for MobileNetV2 on Coral USB TPU; audit flagged the 80ms figure as unrealistic). Pedagogical content unchanged. Rejected (deleted): edge-2537 edge/tco-cost-modeling Audit (both runs) flagged "cognitive load too low for L3 — basic arithmetic word problem with all parameters given". Real L3 TCO questions require judgement under uncertainty; this one is L1/L2. mobile-2146 mobile/duty-cycling Audit flagged a physically absurd 0.5s wake-up at 4W for a mobile NPU (real NPUs wake in milliseconds). Run 2 additionally flagged the dashcam framing as broken (a dashcam idle 75% of the time would miss accidents). Premise is fiction; the lesson can't be salvaged. edge-2535 edge/latency-decomposition Failed validate_drafts.py originality gate at promotion (cosine 0.933 vs its own bridge anchor edge-1883). Was left as .yaml.draft pending review; content is fine on its own, but pedagogically duplicative with the lesson in the now-promoted edge-2536 (host-side bottleneck on Coral). Cleaner to drop than de-duplicate. The 4 ID entries in id-registry.yaml stay (append-only ledger); the removed YAMLs become dangling registry entries which is the intended behaviour — the registry is "every ID ever assigned", not "every ID currently active". Validation: vault check --strict: 10,703 loaded, 0 invariant failures vault build --local-json: 9440 published (was 9438 + 2), chainCount=824, releaseHash a9a601c2bf… (was 479811040b…)
2026-05-06 09:38:33 -05:00 · 2026-05-02 09:39:52 -04:00
parent 2b3cf5e1da
commit 9ab6bb85d0
7 changed files with 26 additions and 218 deletions
--- a/interviews/staffml/src/data/corpus-summary.json
+++ b/interviews/staffml/src/data/corpus-summary.json
--- a/interviews/staffml/src/data/vault-manifest.json
+++ b/interviews/staffml/src/data/vault-manifest.json
@@ -1,23 +1,23 @@
 {
  "releaseId": "dev",
-  "releaseHash": "479811040b7a9f89571c68816a719c99c8b65d0d35aa8f0afd46889140e5911f",
+  "releaseHash": "a9a601c2bf8710a5c9b96dc0ba9349fc6f7b8a7a4bd114f007d4f88e6cf6a7d7",
  "schemaVersion": "1",
  "policyVersion": "1",
-  "buildDate": "2026-05-02T12:49:52Z",
-  "questionCount": 9438,
+  "buildDate": "2026-05-02T13:39:18Z",
+  "questionCount": 9440,
  "chainCount": 824,
  "conceptCount": 87,
  "trackDistribution": {
    "cloud": 4028,
-    "edge": 2077,
+    "edge": 2078,
    "global": 313,
-    "mobile": 1818,
+    "mobile": 1819,
    "tinyml": 1202
  },
  "levelDistribution": {
-    "L4": 2493,
+    "L4": 2494,
    "L1": 463,
-    "L2": 761,
+    "L2": 762,
    "L3": 2228,
    "L5": 2421,
    "L6+": 1072
--- a/interviews/vault/questions/edge/cross-cutting/edge-2537.yaml
+++ b/interviews/vault/questions/edge/cross-cutting/edge-2537.yaml
@@ -1,72 +0,0 @@
-schema_version: '1.0'
-id: edge-2537
-track: edge
-level: L3
-zone: fluency
-topic: tco-cost-modeling
-competency_area: cross-cutting
-bloom_level: apply
-phase: inference
-title: 'Edge TCO Fluency: Monthly Cellular Data Cost Calculation'
-scenario: A fleet of 5,000 edge traffic monitors uses a cellular plan costing $5/GB. Each unit logs 100
-  events per day. Option A transmits a 100KB image per event to the cloud for processing. Option B runs
-  the model locally and transmits a 2KB JSON metadata payload.
-question: What is the total monthly cellular data cost for the fleet under both options, and what are
-  the annual operational savings of choosing Option B?
-details:
-  realistic_solution: Option A costs $7,500 per month, while Option B costs $150 per month. By choosing
-    local processing and sending only metadata (Option B), the fleet saves $88,200 annually in connectivity
-    operations costs.
-  common_mistake: '**The Pitfall:** Overlooking the scale multiplier when calculating edge OpEx.
-
-    **The Rationale:** Candidates might correctly calculate the data cost for a single device or a single
-    day but fail to multiply by the 5,000-unit fleet size and 30-day month for the monthly total, or the
-    12 months for the annual savings.
-
-    **The Consequence:** Proposing architectures that appear cheap on a per-unit basis but become prohibitively
-    expensive at scale, blowing out the OpEx budget.'
-  napkin_math: '**Assumptions & Constraints:**
-
-    Assume standard base-10 networking prefixes (1 GB = 1,000,000 KB). Assume 1 month = 30 days.
-
-
-    **Calculations:**
-
-    Events per month per unit: 100 events/day × 30 days = 3,000 events/month.
-
-    Total fleet events per month: 3,000 × 5,000 = 15,000,000 events.
-
-    Option A data: 15,000,000 × 100KB = 1,500,000,000 KB = 1,500 GB.
-
-    Option A cost: 1,500 GB × $5/GB = $7,500/month.
-
-    Option B data: 15,000,000 × 2KB = 30,000,000 KB = 30 GB.
-
-    Option B cost: 30 GB × $5/GB = $150/month.
-
-    Monthly savings: $7,500 - $150 = $7,350.
-
-    Annual savings: $7,350 × 12 = $88,200.
-
-
-    **Conclusion:**
-
-    Option B provides an $88,200 annual savings, illustrating how edge compute dramatically reduces connectivity
-    OpEx at scale.'
-status: draft
-provenance: llm-draft
-requires_explanation: false
-expected_time_minutes: 10
-tags:
- tco
- cellular
- opex
- bandwidth
- gap-bridge:edge-0731-edge-1154
-authors:
- gemini-3.1-pro-preview
-human_reviewed:
-  status: not-reviewed
-  by: null
-  date: null
-created_at: '2026-05-01T17:27:13+00:00'
--- a/interviews/vault/questions/edge/latency/edge-2535.yaml.draft
+++ b/interviews/vault/questions/edge/latency/edge-2535.yaml.draft
@@ -1,61 +0,0 @@
-schema_version: '1.0'
-id: edge-2535
-track: edge
-level: L3
-zone: diagnosis
-topic: latency-decomposition
-competency_area: latency
-bloom_level: apply
-phase: inference
-title: Theoretical vs. Measured Latency on the Coral USB TPU
-scenario: A team deploys an INT8 quantized MobileNetV2-SSD model requiring 0.6 GOPS to a Coral Edge TPU
-  USB Accelerator, which is rated at 4 TOPS. The team measures an end-to-end frame processing latency
-  of 60ms and concludes the TPU compute is too slow for their workload.
-question: Calculate the theoretical minimum inference latency for this model on the Coral TPU. Based on
-  this calculation, what is the most likely cause of the 60ms measured latency?
-details:
-  realistic_solution: The theoretical compute latency is a fraction of a millisecond. The massive discrepancy
-    between the sub-millisecond compute time and the 60ms measured latency indicates that the bottleneck
-    is not the TPU hardware, but rather host-side operations such as image resizing, color space conversion,
-    USB transfer overhead, or CPU-bound post-processing.
-  common_mistake: '**The Pitfall:** Assuming the TPU compute is the bottleneck because the overall system
-    is slow, or miscalculating the unit conversion between GOPS and TOPS.
-
-    **The Rationale:** Engineers often conflate raw ''inference time'' with ''end-to-end latency'', forgetting
-    that a USB accelerator requires significant host CPU coordination, data marshaling, and PCI/USB bus
-    transfer times.
-
-    **The Consequence:** The team might waste weeks pruning or re-training the neural network architecture
-    when the actual fix involves optimizing host CPU preprocessing pipelines or leveraging hardware video
-    decoders.'
-  napkin_math: '**Assumptions & Constraints:** The model requires 0.6 Giga Operations (GOPS). The Coral
-    TPU provides 4 Tera Operations Per Second (TOPS), which is 4,000 GOPS.
-
-
-    **Calculations:** Theoretical latency = Model Operations / Hardware Throughput = 0.6 GOPS / 4000 GOPS
-    = 0.00015 seconds, or 0.15 milliseconds. Even factoring in a highly conservative 20% hardware utilization
-    rate due to memory bandwidth constraints, compute time is ~0.75ms.
-
-
-    **Conclusion:** The TPU computation accounts for roughly 1% of the 60ms end-to-end latency. The remaining
-    ~59ms is consumed by host-to-device I/O and CPU-bound pre/post-processing tasks.'
-status: draft
-provenance: llm-draft
-requires_explanation: false
-expected_time_minutes: 10
-tags:
- coral-tpu
- bottlenecks
- latency-decomposition
- napkin-math
-_authoring:
-  origin: gemini-3.1-pro-preview
-  tool: generate_question_for_gap.py
-  generated_at: '2026-05-01T17:25:51+00:00'
-  gap:
-    between:
-    - edge-1883
-    - edge-1701
-    missing_level: L3
-    rationale: Calculating expected inference latency versus actual measured pipeline latency on the Coral
-      TPU.
--- a/interviews/vault/questions/edge/optimization/edge-2536.yaml
+++ b/interviews/vault/questions/edge/optimization/edge-2536.yaml
@@ -8,10 +8,10 @@ competency_area: optimization
 bloom_level: analyze
 phase: inference
 title: Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU
-scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 80ms
+scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 15ms
  inference latency, they applied magnitude-based unstructured pruning, achieving 75% sparsity while maintaining
  acceptable accuracy. After compiling with the Edge TPU Compiler and deploying the quantized INT8 model,
-  the inference latency remains stubbornly stuck at 80ms.
+  the inference latency remains stubbornly stuck at 15ms.
 question: Why did the 75% unstructured sparsity fail to yield any latency improvements on the Coral Edge
  TPU, and what architectural characteristic of the accelerator dictates this outcome?
 details:
@@ -39,7 +39,7 @@ details:

    **Conclusion:** Without structured pruning to physically reduce the number of channels or filters,
    the tensor shapes remain identical, and the systolic array offers exactly 0% speedup.'
-status: draft
+status: published
 provenance: llm-draft
 requires_explanation: false
 expected_time_minutes: 10
@@ -52,7 +52,11 @@ tags:
 authors:
 - gemini-3.1-pro-preview
 human_reviewed:
-  status: not-reviewed
-  by: null
-  date: null
+  status: verified
+  by: vj
+  date: '2026-05-02'
+  notes: Reviewed against Gemini independent audit; accepted for publication.
 created_at: '2026-05-01T17:26:16+00:00'
+validation_status: OK
+validation_date: '2026-05-02'
+validation_model: gemini-3.1-pro-preview
--- a/interviews/vault/questions/mobile/deployment/mobile-2147.yaml
+++ b/interviews/vault/questions/mobile/deployment/mobile-2147.yaml
@@ -35,7 +35,7 @@ details:

    **Conclusion:** The conversion process yields a 50% reduction in disk footprint, resulting in a 30
    MB file.'
-status: draft
+status: published
 provenance: llm-draft
 requires_explanation: false
 expected_time_minutes: 5
@@ -48,7 +48,11 @@ tags:
 authors:
 - gemini-3.1-pro-preview
 human_reviewed:
-  status: not-reviewed
-  by: null
-  date: null
+  status: verified
+  by: vj
+  date: '2026-05-02'
+  notes: Reviewed against Gemini independent audit; accepted for publication.
 created_at: '2026-05-01T17:28:18+00:00'
+validation_status: OK
+validation_date: '2026-05-02'
+validation_model: gemini-3.1-pro-preview
--- a/interviews/vault/questions/mobile/power/mobile-2146.yaml
+++ b/interviews/vault/questions/mobile/power/mobile-2146.yaml
@@ -1,67 +0,0 @@
-schema_version: '1.0'
-id: mobile-2146
-track: mobile
-level: L3
-zone: realization
-topic: duty-cycling
-competency_area: power
-bloom_level: apply
-phase: inference
-title: 'The Hidden Cost of Waking Up: Dashcam Duty Cycling'
-scenario: 'You are optimizing a smartphone dashcam app that duty-cycles the NPU. Every 10 seconds, the
-  device executes a cycle: it wakes up the NPU (taking 0.5 seconds at a peak 4W power draw), runs inference
-  for 2 seconds at 3W, and then idles the SoC for the remaining 7.5 seconds at 0.5W.'
-question: Calculate the total energy consumed by the dashcam feature over a 1-hour driving session, explicitly
-  factoring in the transient wake-up overhead.
-details:
-  realistic_solution: 'First, calculate the energy per 10-second cycle by summing the energy of each phase:
-    Wake-up (0.5s * 4W = 2J), Active (2s * 3W = 6J), and Idle (7.5s * 0.5W = 3.75J), totaling 11.75 Joules.
-    Then, multiply this by the 360 cycles in a 1-hour period (3600 seconds) to determine the total consumption
-    of 4,230 Joules.'
-  common_mistake: '**The Pitfall:** Ignoring the wake-up transition time and power overhead when calculating
-    the duty cycle energy.
-
-    **The Rationale:** Developers often assume duty cycling only involves binary active and idle states,
-    overlooking the transient hardware costs like powering up the NPU and loading initial weights into
-    SRAM.
-
-    **The Consequence:** The theoretical energy budget is severely underestimated (in this case by ~17%),
-    leading to unexpected battery drain and missed power targets in production.'
-  napkin_math: '**Assumptions & Constraints:** 1 hour = 3600 seconds. A 10-second cycle occurs 360 times
-    per hour. Energy (Joules) = Power (Watts) * Time (Seconds).
-
-
-    **Calculations:**
-
-    - Energy_wakeup = 4W * 0.5s = 2J
-
-    - Energy_active = 3W * 2s = 6J
-
-    - Energy_idle = 0.5W * 7.5s = 3.75J
-
-    - Energy_cycle = 2J + 6J + 3.75J = 11.75J
-
-    - Total_Energy = 11.75 J/cycle * 360 cycles = 4230 J
-
-
-    **Conclusion:** The total energy consumed over 1 hour is 4,230 Joules. Notably, the wake-up overhead
-    accounts for a significant portion of the total energy despite occupying only 5% of the physical time,
-    demonstrating the limit of rapid duty-cycling.'
-status: draft
-provenance: llm-draft
-requires_explanation: false
-expected_time_minutes: 10
-tags:
- duty-cycling
- power-optimization
- mobile
- npu
- energy-profiling
- gap-bridge:mobile-0367-mobile-2034
-authors:
- gemini-3.1-pro-preview
-human_reviewed:
-  status: not-reviewed
-  by: null
-  date: null
-created_at: '2026-05-01T17:27:47+00:00'