feat(vault): Phase 3 pilot — 5 gaps generated, 4 promoted as drafts

Pilot run of the Phase 3 authoring tooling on a 5-gap subset (sized
down from the roadmap's 30 to keep wall-time + Gemini-call budget
reasonable for an unsupervised run).

Pilot scope:
  Selected 5 high-value gaps from gaps.proposed.lenient.json — buckets
  with ≥4 published questions, biased toward low-density tracks. All 5
  picks landed in edge/mobile.

Phase 3.c — generate (5/5 written):
  edge-2535  edge/latency-decomposition L?→L3
  edge-2536  edge/pruning-sparsity L?→L4
  edge-2537  edge/tco-cost-modeling L?→L3
  mobile-2146  mobile/duty-cycling L?→L3
  mobile-2147  mobile/model-format-conversion L?→L2

Phase 3.b validation — 4/5 pass (80% — above roadmap's 60-75% target):
  edge-2535: FAIL on originality (cos=0.933 vs edge-1883, threshold 0.92)
  edge-2536: pass on all 4 gates
  edge-2537: pass on all 4 gates
  mobile-2146: pass on all 4 gates
  mobile-2147: pass on all 4 gates

The originality gate correctly caught a draft that was too similar
to one of its bridge anchors — exactly the failure mode it was
designed for. Gates were run on schema (Pydantic), originality
(BAAI/bge-small-en-v1.5 cosine vs in-bucket neighbours, threshold
0.92), level_fit (Gemini-judge against same-level exemplars),
coherence (Gemini-judge), and bridge (Gemini-judge against the gap
anchors).

Phase 3.d — promotion (4 passing drafts):
  - .yaml.draft → .yaml rename
  - _authoring stripped; replaced with proper schema fields:
      provenance: llm-draft
      status: draft  (NOT published — gating on human review)
      authors: [gemini-3.1-pro-preview]
      human_reviewed: { status: not-reviewed }
      tags: + gap-bridge:<from>-<to>
  - id-registry.yaml appended (append-only ledger preserved)
  - edge-2535.yaml.draft kept in place for the human reviewer's
    disposition (rewrite + retry vs delete)

Validation post-promotion:
  - vault check --strict: 10,705 loaded (was 10,701; +4 ✓), 0 failures
  - vault build --legacy-json: released set unchanged
    (status=draft excluded by release-policy.yaml's published filter)
    — releaseHash and chainCount intentionally stable until human
    review flips status

Phase 3.e (chain rebuild) deferred: drafts must clear human review
and flip to status: published before they're eligible for chain
membership. Runbook in CHAIN_ROADMAP.md Progress Log.

Cost: 5 generation + 15 judge = 20 Gemini calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Vijay Janapa Reddi
2026-05-01 13:38:18 -04:00
parent 84b1fab082
commit a750ab7bce
8 changed files with 587 additions and 2 deletions

View File

@@ -3,7 +3,7 @@
**Status:** active workstream
**Branch:** `yaml-audit` (off `dev`)
**Worktree:** `/Users/VJ/GitHub/MLSysBook-yaml-audit`
**Last updated:** 2026-05-01 (Phase 3.a + 3.b tooling shipped; 3.c pilot deferred for review)
**Last updated:** 2026-05-01 (Phase 3.c pilot run + 3.d promotion shipped; 4 new draft questions in corpus, awaiting human review)
This document is the canonical resumable plan for the vault chain rebuild
+ corpus growth work. **Future Claude sessions: read the "Resume Here"
@@ -368,7 +368,7 @@ primary chains in default surfaces, exposes secondary in "more paths."
## Phase 3 — Gap-driven question authoring
**Status:** `tooling complete (3.a + 3.b); pilot 3.c deferred for review`
**Status:** `pilot run shipped (3.c + 3.d); 3.e gated on human review of drafts`
**Goal:** Use the 138+ entries in `gaps.proposed.json` to author new
questions filling missing rungs, validated independently before commit.
This is the durable corpus growth strategy.
@@ -1064,5 +1064,123 @@ with the user available to spot-check the first few outputs).
---
### 2026-05-01 — Phase 3.c + 3.d: pilot run + promotion (5 gaps)
**Pilot scope (sized down from the roadmap's 30):** 5 high-value gaps,
selected from `gaps.proposed.lenient.json` favoring (track, topic)
buckets with ≥4 published questions and biased toward low-density
tracks. All 5 picks landed in edge/mobile (the densities the lenient
sweep most needed help on).
**Phase 3.c — generate (`generate_question_for_gap.py`):**
| target | gap | result |
|---|---|---|
| edge-2535 | edge/latency-decomposition L?→L3 between=[edge-1883, edge-1701] | written |
| edge-2536 | edge/pruning-sparsity L?→L4 between=[edge-1960, edge-1957] | written |
| edge-2537 | edge/tco-cost-modeling L?→L3 between=[edge-0731, edge-1154] | written |
| mobile-2146 | mobile/duty-cycling L?→L3 between=[mobile-0367, mobile-2034] | written |
| mobile-2147 | mobile/model-format-conversion L?→L2 between=[mobile-0984, mobile-1022] | written |
5/5 generated cleanly. Each draft passed Pydantic schema validation
inline (the `assemble_draft` → `Question.model_validate` gate); none
were rejected at the file-write step.
Spot-checking `edge-2535`: realistic ML-systems scenario (Coral USB
TPU + MobileNetV2-SSD + INT8), concrete numbers, calculation-driven
question consistent with L3/apply, solution gets at the actual
insight (host-side bottleneck). Other 4 are similarly competent.
**Phase 3.b run — `validate_drafts.py`:**
| draft | originality | level_fit | coherence | bridge | verdict |
|---|---|---|---|---|---|
| edge-2535 | **fail** (cos=0.933 vs edge-1883) | pass | pass | pass | **fail** |
| edge-2536 | pass | pass | pass | pass | **pass** |
| edge-2537 | pass | pass | pass | pass | **pass** |
| mobile-2146 | pass | pass | pass | pass | **pass** |
| mobile-2147 | pass | pass | pass | pass | **pass** |
**4/5 pass = 80% pass rate** (above the roadmap's 60-75% estimate).
The one fail was correctly caught — `edge-2535`'s draft scenario
turned out too similar to one of its between-questions
(`edge-1883`), cosine 0.933 over the 0.92 threshold. This is the
gate working as designed: Gemini occasionally drafts a "bridge" that's
just a paraphrase of one of its anchors instead of a true L3
intermediate. The gate filtered it.
**Phase 3.d — promotion (4 passing drafts):**
- `.yaml.draft` → `.yaml` rename for the 4 passes.
- `_authoring` private metadata stripped at promotion; replaced with:
- `provenance: llm-draft`
- `status: draft` (not `published` — gating on human review)
- `authors: ["gemini-3.1-pro-preview"]`
- `human_reviewed: { status: not-reviewed, ... }` so the
not-yet-reviewed state is honest and machine-checkable.
- `tags`: original tags preserved + a new `gap-bridge:<from>-<to>`
tag so these can be queried later.
- IDs appended to `id-registry.yaml`: `edge-2536`, `edge-2537`,
`mobile-2146`, `mobile-2147` — created_by `generate_question_for_gap.py`.
- `edge-2535.yaml.draft` was **kept in place** (still .yaml.draft).
Decision for the human reviewer when they triage: rewrite + retry,
or delete.
**Validation post-promotion:**
- `vault check --strict` → 10,705 loaded (was 10,701; +4 ✓), 0 invariant
failures.
- `vault build --legacy-json` → released set unchanged: 9438 published,
chainCount=879, releaseHash=04ee8a23… (drafts have status=draft, so
the publishing filter excludes them — by design).
**Phase 3.e — chain rebuild (deferred):**
Skipped tonight. The new questions are `status: draft` and the
chain-builder filters on published, so a rebuild wouldn't pick them
up. The right sequence is: human reviews the 4 drafts → flips status
to `published` (and `human_reviewed.status` to `verified`) → then
re-runs `build_chains_with_gemini.py --all`. At that point chainCount
is expected to grow modestly (the 4 new questions were authored TO
fit chains, so they should land in their bridge slots).
**Files changed in the Phase 3 pilot commit:**
- `interviews/vault/questions/edge/cross-cutting/edge-2537.yaml` (new)
- `interviews/vault/questions/edge/optimization/edge-2536.yaml` (new)
- `interviews/vault/questions/mobile/deployment/mobile-2147.yaml` (new)
- `interviews/vault/questions/mobile/power/mobile-2146.yaml` (new)
- `interviews/vault/questions/edge/latency/edge-2535.yaml.draft` (new — failed validation, awaiting reviewer disposition)
- `interviews/vault/draft-validation-scorecard.json` (new — per-row record)
- `interviews/vault/id-registry.yaml` (4 appended entries)
- `interviews/vault-cli/docs/CHAIN_ROADMAP.md` (this entry)
**Notes for next session — review checklist:**
1. Read each of the 4 promoted drafts. Spot-checks suggest they're
competent but cognitive-load calibration is the place where Gemini
drift is most likely. Each scorecard row has the `level_fit` rationale
from the LLM judge — those are first-cut signals, not authoritative.
2. For the failed `edge-2535`: read it next to its high-cosine
neighbour (`edge-1883`). If it's too duplicative as the originality
gate suggests, delete; if it's actually distinct enough, edit and
re-validate (you can re-run `validate_drafts.py` after editing).
3. Once you're happy with N drafts, flip their `status: draft → published`
and `human_reviewed.status → verified`, set `human_reviewed.by`, then:
```bash
vault check --strict
vault build --legacy-json # released question count goes up by N
python3 interviews/vault-cli/scripts/build_chains_with_gemini.py --all \
--output interviews/vault/chains.proposed.json
python3 interviews/vault-cli/scripts/apply_proposed_chains.py
```
4. If the pilot's 80% rate holds at scale, a 30-gap batch would land
~24 promotable drafts and absorb ~12-15 of them into chains
(chain rebuild typically picks up ~50% of new questions per the
roadmap).
**Cost note:** This pilot used 5 generation calls + 5 × 3 judge calls = 20 Gemini
calls. A 30-gap batch would be ~120 calls (still under the 250/day cap but
worth budgeting around).
**Next step:** Phase 3.e — chain rebuild. Gated on human review of the
4 drafts now in the tree.
---
<!-- Append new entries above this comment, in reverse chronological is fine,
but keep entries dated and self-contained for resume context. -->

View File

@@ -0,0 +1,151 @@
{
"generated_at": "2026-05-01T17:34:46+00:00",
"originality_threshold": 0.92,
"drafts_evaluated": 5,
"passes": 4,
"fails": 1,
"errors": 0,
"rows": [
{
"path": "interviews/vault/questions/edge/cross-cutting/edge-2537.yaml.draft",
"draft_id": "edge-2537",
"track": "edge",
"topic": "tco-cost-modeling",
"level": "L3",
"schema_ok": true,
"originality": "pass",
"originality_detail": {
"top_neighbour": "edge-1169",
"cosine": 0.8187,
"threshold": 0.92,
"bucket_size": 34
},
"level_fit": "pass",
"level_fit_detail": {
"rationale": "The candidate question requires straightforward application of given values to calculate data transmission costs and savings over time, matching the quantitative application (L3) cognitive demand seen in the exemplars."
},
"coherence": "pass",
"coherence_detail": {
"rationale": "The calculations accurately compute monthly data usage based on a 30-day month and 1,000,000 KB per GB, resulting in $7,500 for Option A, $150 for Option B, and exactly $88,200 in annual savings."
},
"bridge": "pass",
"bridge_detail": {
"rationale": "The candidate logically bridges the progression by calculating the quantitative difference between streaming raw data versus local processing (L3), connecting the introductory concept of cellular data costs (L1) to the advanced diagnostic scenario of fixing an architectural data overage (L4)."
},
"verdict": "pass"
},
{
"path": "interviews/vault/questions/edge/latency/edge-2535.yaml.draft",
"draft_id": "edge-2535",
"track": "edge",
"topic": "latency-decomposition",
"level": "L3",
"schema_ok": true,
"originality": "fail",
"originality_detail": {
"top_neighbour": "edge-1883",
"cosine": 0.9328,
"threshold": 0.92,
"bucket_size": 34
},
"originality_reason": "too similar to edge-1883 (cosine=0.933 >= 0.92)",
"level_fit": "pass",
"level_fit_detail": {
"rationale": "The candidate requires applying computational formulas to calculate theoretical latency and using that result to deduce system bottlenecks, aligning perfectly with the L3 application of latency decomposition principles seen in the exemplars."
},
"coherence": "pass",
"coherence_detail": {
"rationale": "The solution correctly identifies that the theoretical latency is a fraction of a millisecond (0.15 ms) and logically attributes the 60ms measured latency to host-side bottlenecks, fully addressing the prompt."
},
"bridge": "pass",
"bridge_detail": {
"rationale": "The candidate perfectly bridges the L2 identification of latency stages and the L4 optimization of host bottlenecks by having the learner calculate the theoretical TPU compute time to prove that host-side overhead dominates."
},
"verdict": "fail"
},
{
"path": "interviews/vault/questions/edge/optimization/edge-2536.yaml.draft",
"draft_id": "edge-2536",
"track": "edge",
"topic": "pruning-sparsity",
"level": "L4",
"schema_ok": true,
"originality": "pass",
"originality_detail": {
"top_neighbour": "edge-1957",
"cosine": 0.9046,
"threshold": 0.92,
"bucket_size": 34
},
"level_fit": "pass",
"level_fit_detail": {
"rationale": "The candidate perfectly mirrors the analytical depth of exemplar edge-0093 by requiring the candidate to analyze the mismatch between unstructured sparsity and dense matrix multiplication hardware (systolic arrays)."
},
"coherence": "pass",
"coherence_detail": {
"rationale": "The scenario, question, and solution are perfectly aligned, accurately addressing the hardware limitation of dense systolic arrays when encountering unstructured sparsity."
},
"bridge": "pass",
"bridge_detail": {
"rationale": "The candidate smoothly bridges the L3 identification of structured pruning and the L5 strategic application by providing an L4 diagnostic analysis of why unstructured pruning fails on the Coral TPU's systolic architecture."
},
"verdict": "pass"
},
{
"path": "interviews/vault/questions/mobile/deployment/mobile-2147.yaml.draft",
"draft_id": "mobile-2147",
"track": "mobile",
"topic": "model-format-conversion",
"level": "L2",
"schema_ok": true,
"originality": "pass",
"originality_detail": {
"top_neighbour": "mobile-1022",
"cosine": 0.8858,
"threshold": 0.92,
"bucket_size": 34
},
"level_fit": "pass",
"level_fit_detail": {
"rationale": "The candidate question requires basic understanding of precision sizes (FP32 vs FP16) and a straightforward calculation to determine storage footprint, perfectly aligning with the L2 comprehension and calculation level demonstrated in the exemplars."
},
"coherence": "pass",
"coherence_detail": {
"rationale": "The calculation accurately determines the storage footprint based on parameter count and data type sizes, reducing 60 MB to 30 MB, perfectly addressing the scenario and question."
},
"bridge": "pass",
"bridge_detail": {
"rationale": "The candidate introduces the mathematical calculation for FP16 parameter sizing in a PyTorch-to-CoreML context at L2, perfectly bridging the L1 format compatibility recall and the L3 pipeline execution that requires an unprompted sizing calculation."
},
"verdict": "pass"
},
{
"path": "interviews/vault/questions/mobile/power/mobile-2146.yaml.draft",
"draft_id": "mobile-2146",
"track": "mobile",
"topic": "duty-cycling",
"level": "L3",
"schema_ok": true,
"originality": "pass",
"originality_detail": {
"top_neighbour": "mobile-0341",
"cosine": 0.8463,
"threshold": 0.92,
"bucket_size": 34
},
"level_fit": "pass",
"level_fit_detail": {
"rationale": "The candidate aligns perfectly with the L3 exemplars by requiring the direct application of power-time-energy formulas to calculate total energy consumption across distinct operational phases."
},
"coherence": "pass",
"coherence_detail": {
"rationale": "The scenario clearly defines the power draw and duration for each phase, which the solution accurately uses to calculate the total energy per cycle and scales correctly for a 1-hour session."
},
"bridge": "pass",
"bridge_detail": {
"rationale": "The candidate seamlessly bridges the L2 simple duty-cycle calculation and the L4 thermal analysis by adding the L3 complexity of transient wake-up overhead within the established dashcam scenario."
},
"verdict": "pass"
}
]
}

View File

@@ -15221,3 +15221,7 @@ entries:
- {id: tinyml-1822, created_at: 2026-04-25T22:29:20+00:00, created_by: registry-rebuild-2026-04-25}
- {id: tinyml-1823, created_at: 2026-04-25T22:29:20+00:00, created_by: registry-rebuild-2026-04-25}
- {id: tinyml-1824, created_at: 2026-04-25T22:29:20+00:00, created_by: registry-rebuild-2026-04-25}
- {id: edge-2537, created_at: 2026-05-01T17:35:39+00:00, created_by: generate_question_for_gap.py}
- {id: edge-2536, created_at: 2026-05-01T17:35:39+00:00, created_by: generate_question_for_gap.py}
- {id: mobile-2147, created_at: 2026-05-01T17:35:39+00:00, created_by: generate_question_for_gap.py}
- {id: mobile-2146, created_at: 2026-05-01T17:35:39+00:00, created_by: generate_question_for_gap.py}

View File

@@ -0,0 +1,72 @@
schema_version: '1.0'
id: edge-2537
track: edge
level: L3
zone: fluency
topic: tco-cost-modeling
competency_area: cross-cutting
bloom_level: apply
phase: inference
title: 'Edge TCO Fluency: Monthly Cellular Data Cost Calculation'
scenario: A fleet of 5,000 edge traffic monitors uses a cellular plan costing $5/GB. Each unit logs 100
events per day. Option A transmits a 100KB image per event to the cloud for processing. Option B runs
the model locally and transmits a 2KB JSON metadata payload.
question: What is the total monthly cellular data cost for the fleet under both options, and what are
the annual operational savings of choosing Option B?
details:
realistic_solution: Option A costs $7,500 per month, while Option B costs $150 per month. By choosing
local processing and sending only metadata (Option B), the fleet saves $88,200 annually in connectivity
operations costs.
common_mistake: '**The Pitfall:** Overlooking the scale multiplier when calculating edge OpEx.
**The Rationale:** Candidates might correctly calculate the data cost for a single device or a single
day but fail to multiply by the 5,000-unit fleet size and 30-day month for the monthly total, or the
12 months for the annual savings.
**The Consequence:** Proposing architectures that appear cheap on a per-unit basis but become prohibitively
expensive at scale, blowing out the OpEx budget.'
napkin_math: '**Assumptions & Constraints:**
Assume standard base-10 networking prefixes (1 GB = 1,000,000 KB). Assume 1 month = 30 days.
**Calculations:**
Events per month per unit: 100 events/day × 30 days = 3,000 events/month.
Total fleet events per month: 3,000 × 5,000 = 15,000,000 events.
Option A data: 15,000,000 × 100KB = 1,500,000,000 KB = 1,500 GB.
Option A cost: 1,500 GB × $5/GB = $7,500/month.
Option B data: 15,000,000 × 2KB = 30,000,000 KB = 30 GB.
Option B cost: 30 GB × $5/GB = $150/month.
Monthly savings: $7,500 - $150 = $7,350.
Annual savings: $7,350 × 12 = $88,200.
**Conclusion:**
Option B provides an $88,200 annual savings, illustrating how edge compute dramatically reduces connectivity
OpEx at scale.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- tco
- cellular
- opex
- bandwidth
- gap-bridge:edge-0731-edge-1154
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
created_at: '2026-05-01T17:27:13+00:00'

View File

@@ -0,0 +1,61 @@
schema_version: '1.0'
id: edge-2535
track: edge
level: L3
zone: diagnosis
topic: latency-decomposition
competency_area: latency
bloom_level: apply
phase: inference
title: Theoretical vs. Measured Latency on the Coral USB TPU
scenario: A team deploys an INT8 quantized MobileNetV2-SSD model requiring 0.6 GOPS to a Coral Edge TPU
USB Accelerator, which is rated at 4 TOPS. The team measures an end-to-end frame processing latency
of 60ms and concludes the TPU compute is too slow for their workload.
question: Calculate the theoretical minimum inference latency for this model on the Coral TPU. Based on
this calculation, what is the most likely cause of the 60ms measured latency?
details:
realistic_solution: The theoretical compute latency is a fraction of a millisecond. The massive discrepancy
between the sub-millisecond compute time and the 60ms measured latency indicates that the bottleneck
is not the TPU hardware, but rather host-side operations such as image resizing, color space conversion,
USB transfer overhead, or CPU-bound post-processing.
common_mistake: '**The Pitfall:** Assuming the TPU compute is the bottleneck because the overall system
is slow, or miscalculating the unit conversion between GOPS and TOPS.
**The Rationale:** Engineers often conflate raw ''inference time'' with ''end-to-end latency'', forgetting
that a USB accelerator requires significant host CPU coordination, data marshaling, and PCI/USB bus
transfer times.
**The Consequence:** The team might waste weeks pruning or re-training the neural network architecture
when the actual fix involves optimizing host CPU preprocessing pipelines or leveraging hardware video
decoders.'
napkin_math: '**Assumptions & Constraints:** The model requires 0.6 Giga Operations (GOPS). The Coral
TPU provides 4 Tera Operations Per Second (TOPS), which is 4,000 GOPS.
**Calculations:** Theoretical latency = Model Operations / Hardware Throughput = 0.6 GOPS / 4000 GOPS
= 0.00015 seconds, or 0.15 milliseconds. Even factoring in a highly conservative 20% hardware utilization
rate due to memory bandwidth constraints, compute time is ~0.75ms.
**Conclusion:** The TPU computation accounts for roughly 1% of the 60ms end-to-end latency. The remaining
~59ms is consumed by host-to-device I/O and CPU-bound pre/post-processing tasks.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- coral-tpu
- bottlenecks
- latency-decomposition
- napkin-math
_authoring:
origin: gemini-3.1-pro-preview
tool: generate_question_for_gap.py
generated_at: '2026-05-01T17:25:51+00:00'
gap:
between:
- edge-1883
- edge-1701
missing_level: L3
rationale: Calculating expected inference latency versus actual measured pipeline latency on the Coral
TPU.

View File

@@ -0,0 +1,58 @@
schema_version: '1.0'
id: edge-2536
track: edge
level: L4
zone: diagnosis
topic: pruning-sparsity
competency_area: optimization
bloom_level: analyze
phase: inference
title: Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU
scenario: A team deployed a MobileNetV2 model onto a Google Coral Edge TPU. To reduce the baseline 80ms
inference latency, they applied magnitude-based unstructured pruning, achieving 75% sparsity while maintaining
acceptable accuracy. After compiling with the Edge TPU Compiler and deploying the quantized INT8 model,
the inference latency remains stubbornly stuck at 80ms.
question: Why did the 75% unstructured sparsity fail to yield any latency improvements on the Coral Edge
TPU, and what architectural characteristic of the accelerator dictates this outcome?
details:
realistic_solution: The Coral Edge TPU relies on a dense systolic array architecture optimized for bulk,
contiguous INT8 matrix multiplications. Unstructured pruning zeroes out scattered weights but does
not change the physical dimensions of the tensors. Because the TPU lacks specialized hardware to skip
unstructured zeros, it executes the full dense MAC grid—simply multiplying by zero—which consumes
the exact same number of clock cycles.
common_mistake: '**The Pitfall:** Assuming that a reduction in theoretical FLOPs automatically translates
to lower inference latency on any hardware.
**The Rationale:** Developers often rely on software-level metrics like model size or parameter count,
neglecting how the specific hardware accelerator schedules and executes matrix operations.
**The Consequence:** The engineering team wastes valuable time retraining and fine-tuning an unstructured
sparse model that provides zero runtime benefit on the deployment edge device.'
napkin_math: '**Assumptions & Constraints:** A single convolutional layer takes 10ms to execute dense.
It is pruned to 75% unstructured sparsity. The Coral TPU executes MACs in fixed dense blocks.
**Calculations:** The compiler cannot shrink the tensor dimensions because the zeros are randomly
distributed. The TPU executes 100% of the MACs, where 75% of them happen to be operations with zero.
Execution time = 10ms * 1.0 (dense execution schedule) = 10ms.
**Conclusion:** Without structured pruning to physically reduce the number of channels or filters,
the tensor shapes remain identical, and the systolic array offers exactly 0% speedup.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- coral-tpu
- unstructured-pruning
- latency-bottleneck
- systolic-array
- gap-bridge:edge-1960-edge-1957
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
created_at: '2026-05-01T17:26:16+00:00'

View File

@@ -0,0 +1,54 @@
schema_version: '1.0'
id: mobile-2147
track: mobile
level: L2
zone: implement
topic: model-format-conversion
competency_area: deployment
bloom_level: understand
phase: inference
title: 'Model Format Conversion: Sizing the FP16 CoreML Payload'
scenario: Your team is preparing to convert a 15 million parameter computer vision model from PyTorch
to CoreML for an iOS app. The original model was trained and saved in standard FP32 precision. To comply
with strict App Store bundle limits, the conversion pipeline is configured to cast weights to FP16.
question: How does the FP16 conversion mathematically impact the model's storage footprint, and what is
the expected payload size of the resulting CoreML model?
details:
realistic_solution: Converting the weights from FP32 (4 bytes per parameter) to FP16 (2 bytes per parameter)
halves the required storage. For a 15 million parameter model, this reduces the disk footprint from
roughly 60 MB to an expected CoreML payload of 30 MB.
common_mistake: '**The Pitfall:** Assuming the parameter count directly translates to megabytes (e.g.,
15 million parameters = 15 MB) or forgetting to account for the byte-size of the data type.
**The Rationale:** Engineers without a systems background often conflate the logical number of parameters
with their physical byte representation on disk.
**The Consequence:** This error in storage calculation leads to severely underestimating the final
app bundle size and potential rejection from the App Store.'
napkin_math: '**Assumptions & Constraints:** The model has 15,000,000 parameters. FP32 requires 4 bytes
per parameter. FP16 requires 2 bytes per parameter. CoreML metadata overhead is negligible.
**Calculations:** Original FP32 size = 15,000,000 * 4 bytes = 60,000,000 bytes (60 MB). Converted
FP16 size = 15,000,000 * 2 bytes = 30,000,000 bytes (30 MB).
**Conclusion:** The conversion process yields a 50% reduction in disk footprint, resulting in a 30
MB file.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 5
tags:
- coreml
- pytorch
- precision
- ios
- gap-bridge:mobile-0984-mobile-1022
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
created_at: '2026-05-01T17:28:18+00:00'

View File

@@ -0,0 +1,67 @@
schema_version: '1.0'
id: mobile-2146
track: mobile
level: L3
zone: realization
topic: duty-cycling
competency_area: power
bloom_level: apply
phase: inference
title: 'The Hidden Cost of Waking Up: Dashcam Duty Cycling'
scenario: 'You are optimizing a smartphone dashcam app that duty-cycles the NPU. Every 10 seconds, the
device executes a cycle: it wakes up the NPU (taking 0.5 seconds at a peak 4W power draw), runs inference
for 2 seconds at 3W, and then idles the SoC for the remaining 7.5 seconds at 0.5W.'
question: Calculate the total energy consumed by the dashcam feature over a 1-hour driving session, explicitly
factoring in the transient wake-up overhead.
details:
realistic_solution: 'First, calculate the energy per 10-second cycle by summing the energy of each phase:
Wake-up (0.5s * 4W = 2J), Active (2s * 3W = 6J), and Idle (7.5s * 0.5W = 3.75J), totaling 11.75 Joules.
Then, multiply this by the 360 cycles in a 1-hour period (3600 seconds) to determine the total consumption
of 4,230 Joules.'
common_mistake: '**The Pitfall:** Ignoring the wake-up transition time and power overhead when calculating
the duty cycle energy.
**The Rationale:** Developers often assume duty cycling only involves binary active and idle states,
overlooking the transient hardware costs like powering up the NPU and loading initial weights into
SRAM.
**The Consequence:** The theoretical energy budget is severely underestimated (in this case by ~17%),
leading to unexpected battery drain and missed power targets in production.'
napkin_math: '**Assumptions & Constraints:** 1 hour = 3600 seconds. A 10-second cycle occurs 360 times
per hour. Energy (Joules) = Power (Watts) * Time (Seconds).
**Calculations:**
- Energy_wakeup = 4W * 0.5s = 2J
- Energy_active = 3W * 2s = 6J
- Energy_idle = 0.5W * 7.5s = 3.75J
- Energy_cycle = 2J + 6J + 3.75J = 11.75J
- Total_Energy = 11.75 J/cycle * 360 cycles = 4230 J
**Conclusion:** The total energy consumed over 1 hour is 4,230 Joules. Notably, the wake-up overhead
accounts for a significant portion of the total energy despite occupying only 5% of the physical time,
demonstrating the limit of rapid duty-cycling.'
status: draft
provenance: llm-draft
requires_explanation: false
expected_time_minutes: 10
tags:
- duty-cycling
- power-optimization
- mobile
- npu
- energy-profiling
- gap-bridge:mobile-0367-mobile-2034
authors:
- gemini-3.1-pro-preview
human_reviewed:
status: not-reviewed
by: null
date: null
created_at: '2026-05-01T17:27:47+00:00'