mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 09:38:33 -05:00
feat(vault): Phase 3 pilot disposition — 2 published, 3 rejected
Acting on the audit findings (independent Gemini audit, 2 runs converged
on the same per-draft verdicts). Of the 5 drafts in the Phase 3 pilot:
Published (status: published, human_reviewed: verified):
mobile-2147 Model Format Conversion: Sizing the FP16 CoreML Payload
Clean L2 / understand. FP32→FP16 storage halving on a
15M-param iOS model. Realistic App Store framing,
correct math, no fabrication.
edge-2536 Diagnosing Zero Latency Gains from Unstructured Pruning
on Coral TPU
Canonical L4 / analyze lesson on dense systolic arrays
+ unstructured sparsity. Edited the scenario's baseline
latency from 80ms → 15ms (more realistic for MobileNetV2
on Coral USB TPU; audit flagged the 80ms figure as
unrealistic). Pedagogical content unchanged.
Rejected (deleted):
edge-2537 edge/tco-cost-modeling
Audit (both runs) flagged "cognitive load too low for L3
— basic arithmetic word problem with all parameters
given". Real L3 TCO questions require judgement under
uncertainty; this one is L1/L2.
mobile-2146 mobile/duty-cycling
Audit flagged a physically absurd 0.5s wake-up at 4W for
a mobile NPU (real NPUs wake in milliseconds). Run 2
additionally flagged the dashcam framing as broken (a
dashcam idle 75% of the time would miss accidents).
Premise is fiction; the lesson can't be salvaged.
edge-2535 edge/latency-decomposition
Failed validate_drafts.py originality gate at promotion
(cosine 0.933 vs its own bridge anchor edge-1883). Was
left as .yaml.draft pending review; content is fine on
its own, but pedagogically duplicative with the lesson
in the now-promoted edge-2536 (host-side bottleneck on
Coral). Cleaner to drop than de-duplicate.
The 4 ID entries in id-registry.yaml stay (append-only ledger); the
removed YAMLs become dangling registry entries which is the intended
behaviour — the registry is "every ID ever assigned", not "every ID
currently active".
Validation:
vault check --strict: 10,703 loaded, 0 invariant failures
vault build --local-json: 9440 published (was 9438 + 2), chainCount=824,
releaseHash a9a601c2bf… (was 479811040b…)
This commit is contained in:
File diff suppressed because one or more lines are too long
@@ -1,23 +1,23 @@
|
||||
{
|
||||
"releaseId": "dev",
|
||||
"releaseHash": "479811040b7a9f89571c68816a719c99c8b65d0d35aa8f0afd46889140e5911f",
|
||||
"releaseHash": "a9a601c2bf8710a5c9b96dc0ba9349fc6f7b8a7a4bd114f007d4f88e6cf6a7d7",
|
||||
"schemaVersion": "1",
|
||||
"policyVersion": "1",
|
||||
"buildDate": "2026-05-02T12:49:52Z",
|
||||
"questionCount": 9438,
|
||||
"buildDate": "2026-05-02T13:39:18Z",
|
||||
"questionCount": 9440,
|
||||
"chainCount": 824,
|
||||
"conceptCount": 87,
|
||||
"trackDistribution": {
|
||||
"cloud": 4028,
|
||||
"edge": 2077,
|
||||
"edge": 2078,
|
||||
"global": 313,
|
||||
"mobile": 1818,
|
||||
"mobile": 1819,
|
||||
"tinyml": 1202
|
||||
},
|
||||
"levelDistribution": {
|
||||
"L4": 2493,
|
||||
"L4": 2494,
|
||||
"L1": 463,
|
||||
"L2": 761,
|
||||
"L2": 762,
|
||||
"L3": 2228,
|
||||
"L5": 2421,
|
||||
"L6+": 1072
|
||||
|
||||
@@ -1,72 +0,0 @@
|
||||
schema_version: '1.0'
|
||||
id: edge-2537
|
||||
track: edge
|
||||
level: L3
|
||||
zone: fluency
|
||||
topic: tco-cost-modeling
|
||||
competency_area: cross-cutting
|
||||
bloom_level: apply
|
||||
phase: inference
|
||||
title: 'Edge TCO Fluency: Monthly Cellular Data Cost Calculation'
|
||||
scenario: A fleet of 5,000 edge traffic monitors uses a cellular plan costing $5/GB. Each unit logs 100
|
||||
events per day. Option A transmits a 100KB image per event to the cloud for processing. Option B runs
|
||||
the model locally and transmits a 2KB JSON metadata payload.
|
||||
question: What is the total monthly cellular data cost for the fleet under both options, and what are
|
||||
the annual operational savings of choosing Option B?
|
||||
details:
|
||||
realistic_solution: Option A costs $7,500 per month, while Option B costs $150 per month. By choosing
|
||||
local processing and sending only metadata (Option B), the fleet saves $88,200 annually in connectivity
|
||||
operations costs.
|
||||
common_mistake: '**The Pitfall:** Overlooking the scale multiplier when calculating edge OpEx.
|
||||
|
||||
**The Rationale:** Candidates might correctly calculate the data cost for a single device or a single
|
||||
day but fail to multiply by the 5,000-unit fleet size and 30-day month for the monthly total, or the
|
||||
12 months for the annual savings.
|
||||
|
||||
**The Consequence:** Proposing architectures that appear cheap on a per-unit basis but become prohibitively
|
||||
expensive at scale, blowing out the OpEx budget.'
|
||||
napkin_math: '**Assumptions & Constraints:**
|
||||
|
||||
Assume standard base-10 networking prefixes (1 GB = 1,000,000 KB). Assume 1 month = 30 days.
|
||||
|
||||
|
||||
**Calculations:**
|
||||
|
||||
Events per month per unit: 100 events/day × 30 days = 3,000 events/month.
|
||||
|
||||
Total fleet events per month: 3,000 × 5,000 = 15,000,000 events.
|
||||
|
||||
Option A data: 15,000,000 × 100KB = 1,500,000,000 KB = 1,500 GB.
|
||||
|
||||
Option A cost: 1,500 GB × $5/GB = $7,500/month.
|
||||
|
||||
Option B data: 15,000,000 × 2KB = 30,000,000 KB = 30 GB.
|
||||
|
||||
Option B cost: 30 GB × $5/GB = $150/month.
|
||||
|
||||
Monthly savings: $7,500 - $150 = $7,350.
|
||||
|
||||
Annual savings: $7,350 × 12 = $88,200.
|
||||
|
||||
|
||||
**Conclusion:**
|
||||
|
||||
Option B provides an $88,200 annual savings, illustrating how edge compute dramatically reduces connectivity
|
||||
OpEx at scale.'
|
||||
status: draft
|
||||
provenance: llm-draft
|
||||
requires_explanation: false
|
||||
expected_time_minutes: 10
|
||||
tags:
|
||||
- tco
|
||||
- cellular
|
||||
- opex
|
||||
- bandwidth
|
||||
- gap-bridge:edge-0731-edge-1154
|
||||
authors:
|
||||
- gemini-3.1-pro-preview
|
||||
human_reviewed:
|
||||
status: not-reviewed
|
||||
by: null
|
||||
date: null
|
||||
created_at: '2026-05-01T17:27:13+00:00'
|
||||
@@ -1,61 +0,0 @@
|
||||
schema_version: '1.0'
|
||||
id: edge-2535
|
||||
track: edge
|
||||
level: L3
|
||||
zone: diagnosis
|
||||
topic: latency-decomposition
|
||||
competency_area: latency
|
||||
bloom_level: apply
|
||||
phase: inference
|
||||
title: Theoretical vs. Measured Latency on the Coral USB TPU
|
||||
scenario: A team deploys an INT8 quantized MobileNetV2-SSD model requiring 0.6 GOPS to a Coral Edge TPU
|
||||
USB Accelerator, which is rated at 4 TOPS. The team measures an end-to-end frame processing latency
|
||||
of 60ms and concludes the TPU compute is too slow for their workload.
|
||||
question: Calculate the theoretical minimum inference latency for this model on the Coral TPU. Based on
|
||||
this calculation, what is the most likely cause of the 60ms measured latency?
|
||||
details:
|
||||
realistic_solution: The theoretical compute latency is a fraction of a millisecond. The massive discrepancy
|
||||
between the sub-millisecond compute time and the 60ms measured latency indicates that the bottleneck
|
||||
is not the TPU hardware, but rather host-side operations such as image resizing, color space conversion,
|
||||
USB transfer overhead, or CPU-bound post-processing.
|
||||
common_mistake: '**The Pitfall:** Assuming the TPU compute is the bottleneck because the overall system
|
||||
is slow, or miscalculating the unit conversion between GOPS and TOPS.
|
||||
|
||||
**The Rationale:** Engineers often conflate raw ''inference time'' with ''end-to-end latency'', forgetting
|
||||
that a USB accelerator requires significant host CPU coordination, data marshaling, and PCI/USB bus
|
||||
transfer times.
|
||||
|
||||
**The Consequence:** The team might waste weeks pruning or re-training the neural network architecture
|
||||
when the actual fix involves optimizing host CPU preprocessing pipelines or leveraging hardware video
|
||||
decoders.'
|
||||
napkin_math: '**Assumptions & Constraints:** The model requires 0.6 Giga Operations (GOPS). The Coral
|
||||
TPU provides 4 Tera Operations Per Second (TOPS), which is 4,000 GOPS.
|
||||
|
||||
|
||||
**Calculations:** Theoretical latency = Model Operations / Hardware Throughput = 0.6 GOPS / 4000 GOPS
|
||||
= 0.00015 seconds, or 0.15 milliseconds. Even factoring in a highly conservative 20% hardware utilization
|
||||
rate due to memory bandwidth constraints, compute time is ~0.75ms.
|
||||
|
||||
|
||||
**Conclusion:** The TPU computation accounts for roughly 1% of the 60ms end-to-end latency. The remaining
|
||||
~59ms is consumed by host-to-device I/O and CPU-bound pre/post-processing tasks.'
|
||||
status: draft
|
||||
provenance: llm-draft
|
||||
requires_explanation: false
|
||||
expected_time_minutes: 10
|
||||
tags:
|
||||
- coral-tpu
|
||||
- bottlenecks
|
||||
- latency-decomposition
|
||||
- napkin-math
|
||||
_authoring:
|
||||
origin: gemini-3.1-pro-preview
|
||||
tool: generate_question_for_gap.py
|
||||
generated_at: '2026-05-01T17:25:51+00:00'
|
||||
gap:
|
||||
between:
|
||||
- edge-1883
|
||||
- edge-1701
|
||||
missing_level: L3
|
||||
rationale: Calculating expected inference latency versus actual measured pipeline latency on the Coral
|
||||
TPU.
|
||||
@@ -8,10 +8,10 @@ competency_area: optimization
|
||||
bloom_level: analyze
|
||||
phase: inference
|
||||
title: Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU
|
||||
scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 80ms
|
||||
scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 15ms
|
||||
inference latency, they applied magnitude-based unstructured pruning, achieving 75% sparsity while maintaining
|
||||
acceptable accuracy. After compiling with the Edge TPU Compiler and deploying the quantized INT8 model,
|
||||
the inference latency remains stubbornly stuck at 80ms.
|
||||
the inference latency remains stubbornly stuck at 15ms.
|
||||
question: Why did the 75% unstructured sparsity fail to yield any latency improvements on the Coral Edge
|
||||
TPU, and what architectural characteristic of the accelerator dictates this outcome?
|
||||
details:
|
||||
@@ -39,7 +39,7 @@ details:
|
||||
|
||||
**Conclusion:** Without structured pruning to physically reduce the number of channels or filters,
|
||||
the tensor shapes remain identical, and the systolic array offers exactly 0% speedup.'
|
||||
status: draft
|
||||
status: published
|
||||
provenance: llm-draft
|
||||
requires_explanation: false
|
||||
expected_time_minutes: 10
|
||||
@@ -52,7 +52,11 @@ tags:
|
||||
authors:
|
||||
- gemini-3.1-pro-preview
|
||||
human_reviewed:
|
||||
status: not-reviewed
|
||||
by: null
|
||||
date: null
|
||||
status: verified
|
||||
by: vj
|
||||
date: '2026-05-02'
|
||||
notes: Reviewed against Gemini independent audit; accepted for publication.
|
||||
created_at: '2026-05-01T17:26:16+00:00'
|
||||
validation_status: OK
|
||||
validation_date: '2026-05-02'
|
||||
validation_model: gemini-3.1-pro-preview
|
||||
|
||||
@@ -35,7 +35,7 @@ details:
|
||||
|
||||
**Conclusion:** The conversion process yields a 50% reduction in disk footprint, resulting in a 30
|
||||
MB file.'
|
||||
status: draft
|
||||
status: published
|
||||
provenance: llm-draft
|
||||
requires_explanation: false
|
||||
expected_time_minutes: 5
|
||||
@@ -48,7 +48,11 @@ tags:
|
||||
authors:
|
||||
- gemini-3.1-pro-preview
|
||||
human_reviewed:
|
||||
status: not-reviewed
|
||||
by: null
|
||||
date: null
|
||||
status: verified
|
||||
by: vj
|
||||
date: '2026-05-02'
|
||||
notes: Reviewed against Gemini independent audit; accepted for publication.
|
||||
created_at: '2026-05-01T17:28:18+00:00'
|
||||
validation_status: OK
|
||||
validation_date: '2026-05-02'
|
||||
validation_model: gemini-3.1-pro-preview
|
||||
|
||||
@@ -1,67 +0,0 @@
|
||||
schema_version: '1.0'
|
||||
id: mobile-2146
|
||||
track: mobile
|
||||
level: L3
|
||||
zone: realization
|
||||
topic: duty-cycling
|
||||
competency_area: power
|
||||
bloom_level: apply
|
||||
phase: inference
|
||||
title: 'The Hidden Cost of Waking Up: Dashcam Duty Cycling'
|
||||
scenario: 'You are optimizing a smartphone dashcam app that duty-cycles the NPU. Every 10 seconds, the
|
||||
device executes a cycle: it wakes up the NPU (taking 0.5 seconds at a peak 4W power draw), runs inference
|
||||
for 2 seconds at 3W, and then idles the SoC for the remaining 7.5 seconds at 0.5W.'
|
||||
question: Calculate the total energy consumed by the dashcam feature over a 1-hour driving session, explicitly
|
||||
factoring in the transient wake-up overhead.
|
||||
details:
|
||||
realistic_solution: 'First, calculate the energy per 10-second cycle by summing the energy of each phase:
|
||||
Wake-up (0.5s * 4W = 2J), Active (2s * 3W = 6J), and Idle (7.5s * 0.5W = 3.75J), totaling 11.75 Joules.
|
||||
Then, multiply this by the 360 cycles in a 1-hour period (3600 seconds) to determine the total consumption
|
||||
of 4,230 Joules.'
|
||||
common_mistake: '**The Pitfall:** Ignoring the wake-up transition time and power overhead when calculating
|
||||
the duty cycle energy.
|
||||
|
||||
**The Rationale:** Developers often assume duty cycling only involves binary active and idle states,
|
||||
overlooking the transient hardware costs like powering up the NPU and loading initial weights into
|
||||
SRAM.
|
||||
|
||||
**The Consequence:** The theoretical energy budget is severely underestimated (in this case by ~17%),
|
||||
leading to unexpected battery drain and missed power targets in production.'
|
||||
napkin_math: '**Assumptions & Constraints:** 1 hour = 3600 seconds. A 10-second cycle occurs 360 times
|
||||
per hour. Energy (Joules) = Power (Watts) * Time (Seconds).
|
||||
|
||||
|
||||
**Calculations:**
|
||||
|
||||
- Energy_wakeup = 4W * 0.5s = 2J
|
||||
|
||||
- Energy_active = 3W * 2s = 6J
|
||||
|
||||
- Energy_idle = 0.5W * 7.5s = 3.75J
|
||||
|
||||
- Energy_cycle = 2J + 6J + 3.75J = 11.75J
|
||||
|
||||
- Total_Energy = 11.75 J/cycle * 360 cycles = 4230 J
|
||||
|
||||
|
||||
**Conclusion:** The total energy consumed over 1 hour is 4,230 Joules. Notably, the wake-up overhead
|
||||
accounts for a significant portion of the total energy despite occupying only 5% of the physical time,
|
||||
demonstrating the limit of rapid duty-cycling.'
|
||||
status: draft
|
||||
provenance: llm-draft
|
||||
requires_explanation: false
|
||||
expected_time_minutes: 10
|
||||
tags:
|
||||
- duty-cycling
|
||||
- power-optimization
|
||||
- mobile
|
||||
- npu
|
||||
- energy-profiling
|
||||
- gap-bridge:mobile-0367-mobile-2034
|
||||
authors:
|
||||
- gemini-3.1-pro-preview
|
||||
human_reviewed:
|
||||
status: not-reviewed
|
||||
by: null
|
||||
date: null
|
||||
created_at: '2026-05-01T17:27:47+00:00'
|
||||
Reference in New Issue
Block a user