feat(vault): Phase 3 pilot disposition — 2 published, 3 rejected

Acting on the audit findings (independent Gemini audit, 2 runs converged
on the same per-draft verdicts). Of the 5 drafts in the Phase 3 pilot:

Published (status: published, human_reviewed: verified):
  mobile-2147  Model Format Conversion: Sizing the FP16 CoreML Payload
               Clean L2 / understand. FP32→FP16 storage halving on a
               15M-param iOS model. Realistic App Store framing,
               correct math, no fabrication.

  edge-2536    Diagnosing Zero Latency Gains from Unstructured Pruning
               on Coral TPU
               Canonical L4 / analyze lesson on dense systolic arrays
               + unstructured sparsity. Edited the scenario's baseline
               latency from 80ms → 15ms (more realistic for MobileNetV2
               on Coral USB TPU; audit flagged the 80ms figure as
               unrealistic). Pedagogical content unchanged.

Rejected (deleted):
  edge-2537    edge/tco-cost-modeling
               Audit (both runs) flagged "cognitive load too low for L3
               — basic arithmetic word problem with all parameters
               given". Real L3 TCO questions require judgement under
               uncertainty; this one is L1/L2.

  mobile-2146  mobile/duty-cycling
               Audit flagged a physically absurd 0.5s wake-up at 4W for
               a mobile NPU (real NPUs wake in milliseconds). Run 2
               additionally flagged the dashcam framing as broken (a
               dashcam idle 75% of the time would miss accidents).
               Premise is fiction; the lesson can't be salvaged.

  edge-2535    edge/latency-decomposition
               Failed validate_drafts.py originality gate at promotion
               (cosine 0.933 vs its own bridge anchor edge-1883). Was
               left as .yaml.draft pending review; content is fine on
               its own, but pedagogically duplicative with the lesson
               in the now-promoted edge-2536 (host-side bottleneck on
               Coral). Cleaner to drop than de-duplicate.

The 4 ID entries in id-registry.yaml stay (append-only ledger); the
removed YAMLs become dangling registry entries which is the intended
behaviour — the registry is "every ID ever assigned", not "every ID
currently active".

Validation:
  vault check --strict:    10,703 loaded, 0 invariant failures
  vault build --local-json: 9440 published (was 9438 + 2), chainCount=824,
                           releaseHash a9a601c2bf… (was 479811040b…)
This commit is contained in:
Vijay Janapa Reddi
2026-05-02 09:39:52 -04:00
parent 2b3cf5e1da
commit 9ab6bb85d0
7 changed files with 26 additions and 218 deletions

File diff suppressed because one or more lines are too long

View File

@@ -1,23 +1,23 @@
{
"releaseId": "dev",
"releaseHash": "479811040b7a9f89571c68816a719c99c8b65d0d35aa8f0afd46889140e5911f",
"releaseHash": "a9a601c2bf8710a5c9b96dc0ba9349fc6f7b8a7a4bd114f007d4f88e6cf6a7d7",
"schemaVersion": "1",
"policyVersion": "1",
"buildDate": "2026-05-02T12:49:52Z",
"questionCount": 9438,
"buildDate": "2026-05-02T13:39:18Z",
"questionCount": 9440,
"chainCount": 824,
"conceptCount": 87,
"trackDistribution": {
"cloud": 4028,
"edge": 2077,
"edge": 2078,
"global": 313,
"mobile": 1818,
"mobile": 1819,
"tinyml": 1202
},
"levelDistribution": {
"L4": 2493,
"L4": 2494,
"L1": 463,
"L2": 761,
"L2": 762,
"L3": 2228,
"L5": 2421,
"L6+": 1072

View File

@@ -1,72 +0,0 @@
schema_version: '1.0'
id: edge-2537
track: edge
level: L3
zone: fluency
topic: tco-cost-modeling
competency_area: cross-cutting
bloom_level: apply
phase: inference
title: 'Edge TCO Fluency: Monthly Cellular Data Cost Calculation'
scenario: A fleet of 5,000 edge traffic monitors uses a cellular plan costing $5/GB. Each unit logs 100
events per day. Option A transmits a 100KB image per event to the cloud for processing. Option B runs
the model locally and transmits a 2KB JSON metadata payload.
question: What is the total monthly cellular data cost for the fleet under both options, and what are
the annual operational savings of choosing Option B?
details:
realistic_solution: Option A costs $7,500 per month, while Option B costs $150 per month. By choosing
local processing and sending only metadata (Option B), the fleet saves $88,200 annually in connectivity
operations costs.
common_mistake: '**The Pitfall:** Overlooking the scale multiplier when calculating edge OpEx.
**The Rationale:** Candidates might correctly calculate the data cost for a single device or a single
day but fail to multiply by the 5,000-unit fleet size and 30-day month for the monthly total, or the
12 months for the annual savings.
**The Consequence:** Proposing architectures that appear cheap on a per-unit basis but become prohibitively
expensive at scale, blowing out the OpEx budget.'
napkin_math: '**Assumptions & Constraints:**
Assume standard base-10 networking prefixes (1 GB = 1,000,000 KB). Assume 1 month = 30 days.
**Calculations:**
Events per month per unit: 100 events/day × 30 days = 3,000 events/month.
Total fleet events per month: 3,000 × 5,000 = 15,000,000 events.
Option A data: 15,000,000 × 100KB = 1,500,000,000 KB = 1,500 GB.
Option A cost: 1,500 GB × $5/GB = $7,500/month.
Option B data: 15,000,000 × 2KB = 30,000,000 KB = 30 GB.
Option B cost: 30 GB × $5/GB = $150/month.
Monthly savings: $7,500 - $150 = $7,350.
Annual savings: $7,350 × 12 = $88,200.
**Conclusion:**
Option B provides an $88,200 annual savings, illustrating how edge compute dramatically reduces connectivity
OpEx at scale.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- tco
- cellular
- opex
- bandwidth
- gap-bridge:edge-0731-edge-1154
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
created_at: '2026-05-01T17:27:13+00:00'

View File

@@ -1,61 +0,0 @@
schema_version: '1.0'
id: edge-2535
track: edge
level: L3
zone: diagnosis
topic: latency-decomposition
competency_area: latency
bloom_level: apply
phase: inference
title: Theoretical vs. Measured Latency on the Coral USB TPU
scenario: A team deploys an INT8 quantized MobileNetV2-SSD model requiring 0.6 GOPS to a Coral Edge TPU
USB Accelerator, which is rated at 4 TOPS. The team measures an end-to-end frame processing latency
of 60ms and concludes the TPU compute is too slow for their workload.
question: Calculate the theoretical minimum inference latency for this model on the Coral TPU. Based on
this calculation, what is the most likely cause of the 60ms measured latency?
details:
realistic_solution: The theoretical compute latency is a fraction of a millisecond. The massive discrepancy
between the sub-millisecond compute time and the 60ms measured latency indicates that the bottleneck
is not the TPU hardware, but rather host-side operations such as image resizing, color space conversion,
USB transfer overhead, or CPU-bound post-processing.
common_mistake: '**The Pitfall:** Assuming the TPU compute is the bottleneck because the overall system
is slow, or miscalculating the unit conversion between GOPS and TOPS.
**The Rationale:** Engineers often conflate raw ''inference time'' with ''end-to-end latency'', forgetting
that a USB accelerator requires significant host CPU coordination, data marshaling, and PCI/USB bus
transfer times.
**The Consequence:** The team might waste weeks pruning or re-training the neural network architecture
when the actual fix involves optimizing host CPU preprocessing pipelines or leveraging hardware video
decoders.'
napkin_math: '**Assumptions & Constraints:** The model requires 0.6 Giga Operations (GOPS). The Coral
TPU provides 4 Tera Operations Per Second (TOPS), which is 4,000 GOPS.
**Calculations:** Theoretical latency = Model Operations / Hardware Throughput = 0.6 GOPS / 4000 GOPS
= 0.00015 seconds, or 0.15 milliseconds. Even factoring in a highly conservative 20% hardware utilization
rate due to memory bandwidth constraints, compute time is ~0.75ms.
**Conclusion:** The TPU computation accounts for roughly 1% of the 60ms end-to-end latency. The remaining
~59ms is consumed by host-to-device I/O and CPU-bound pre/post-processing tasks.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- coral-tpu
- bottlenecks
- latency-decomposition
- napkin-math
_authoring:
origin: gemini-3.1-pro-preview
tool: generate_question_for_gap.py
generated_at: '2026-05-01T17:25:51+00:00'
gap:
between:
- edge-1883
- edge-1701
missing_level: L3
rationale: Calculating expected inference latency versus actual measured pipeline latency on the Coral
TPU.

View File

@@ -8,10 +8,10 @@ competency_area: optimization
bloom_level: analyze
phase: inference
title: Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU
scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 80ms
scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 15ms
inference latency, they applied magnitude-based unstructured pruning, achieving 75% sparsity while maintaining
acceptable accuracy. After compiling with the Edge TPU Compiler and deploying the quantized INT8 model,
the inference latency remains stubbornly stuck at 80ms.
the inference latency remains stubbornly stuck at 15ms.
question: Why did the 75% unstructured sparsity fail to yield any latency improvements on the Coral Edge
TPU, and what architectural characteristic of the accelerator dictates this outcome?
details:
@@ -39,7 +39,7 @@ details:
**Conclusion:** Without structured pruning to physically reduce the number of channels or filters,
the tensor shapes remain identical, and the systolic array offers exactly 0% speedup.'
status: draft
status: published
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
@@ -52,7 +52,11 @@ tags:
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
status: verified
by: vj
date: '2026-05-02'
notes: Reviewed against Gemini independent audit; accepted for publication.
created_at: '2026-05-01T17:26:16+00:00'
validation_status: OK
validation_date: '2026-05-02'
validation_model: gemini-3.1-pro-preview

View File

@@ -35,7 +35,7 @@ details:
**Conclusion:** The conversion process yields a 50% reduction in disk footprint, resulting in a 30
MB file.'
status: draft
status: published
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 5
@@ -48,7 +48,11 @@ tags:
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
status: verified
by: vj
date: '2026-05-02'
notes: Reviewed against Gemini independent audit; accepted for publication.
created_at: '2026-05-01T17:28:18+00:00'
validation_status: OK
validation_date: '2026-05-02'
validation_model: gemini-3.1-pro-preview

View File

@@ -1,67 +0,0 @@
schema_version: '1.0'
id: mobile-2146
track: mobile
level: L3
zone: realization
topic: duty-cycling
competency_area: power
bloom_level: apply
phase: inference
title: 'The Hidden Cost of Waking Up: Dashcam Duty Cycling'
scenario: 'You are optimizing a smartphone dashcam app that duty-cycles the NPU. Every 10 seconds, the
device executes a cycle: it wakes up the NPU (taking 0.5 seconds at a peak 4W power draw), runs inference
for 2 seconds at 3W, and then idles the SoC for the remaining 7.5 seconds at 0.5W.'
question: Calculate the total energy consumed by the dashcam feature over a 1-hour driving session, explicitly
factoring in the transient wake-up overhead.
details:
realistic_solution: 'First, calculate the energy per 10-second cycle by summing the energy of each phase:
Wake-up (0.5s * 4W = 2J), Active (2s * 3W = 6J), and Idle (7.5s * 0.5W = 3.75J), totaling 11.75 Joules.
Then, multiply this by the 360 cycles in a 1-hour period (3600 seconds) to determine the total consumption
of 4,230 Joules.'
common_mistake: '**The Pitfall:** Ignoring the wake-up transition time and power overhead when calculating
the duty cycle energy.
**The Rationale:** Developers often assume duty cycling only involves binary active and idle states,
overlooking the transient hardware costs like powering up the NPU and loading initial weights into
SRAM.
**The Consequence:** The theoretical energy budget is severely underestimated (in this case by ~17%),
leading to unexpected battery drain and missed power targets in production.'
napkin_math: '**Assumptions & Constraints:** 1 hour = 3600 seconds. A 10-second cycle occurs 360 times
per hour. Energy (Joules) = Power (Watts) * Time (Seconds).
**Calculations:**
- Energy_wakeup = 4W * 0.5s = 2J
- Energy_active = 3W * 2s = 6J
- Energy_idle = 0.5W * 7.5s = 3.75J
- Energy_cycle = 2J + 6J + 3.75J = 11.75J
- Total_Energy = 11.75 J/cycle * 360 cycles = 4230 J
**Conclusion:** The total energy consumed over 1 hour is 4,230 Joules. Notably, the wake-up overhead
accounts for a significant portion of the total energy despite occupying only 5% of the physical time,
demonstrating the limit of rapid duty-cycling.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- duty-cycling
- power-optimization
- mobile
- npu
- energy-profiling
- gap-bridge:mobile-0367-mobile-2034
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
created_at: '2026-05-01T17:27:47+00:00'