Files
cs249r_book/interviews/vault/chains.json
Vijay Janapa Reddi 924363e2b7 feat(vault): Phase 3 batch — 6 questions published + chain rebuild
Second Phase 3 batch run (post-pre-filter and post-tightened-validator).
30 gaps fed in; 21 dropped by the gap pre-filter as hallucinated; 9
generated drafts; 6 cleared all gates and were published; 3 dropped on
level_fit (level inflation pattern).

Published (status=published, human_reviewed=verified by vj, all gates
pass + audit_math pass):
  edge-2540    L4  edge/real-time-deadlines
  mobile-2151  L4  mobile/kv-cache-management
  mobile-2152  L2  mobile/kv-cache-management
  mobile-2154  L4  mobile/model-serving-infrastructure
  mobile-2157  L4  mobile/roofline-analysis
  mobile-2161  L5  mobile/power-budgeting

Rejected (level_fit failures — Gemini stamped L3-L5 on questions whose
cognitive demand is L1/L2; same failure mode the audit caught on the
first pilot):
  edge-2537    edge/real-time-deadlines       (level inflation)
  edge-2543    edge/transformer-systems-cost  (level inflation + mixed
                                               base-2/base-10 conversions)
  mobile-2156  mobile/quantization-fundamentals (level inflation)

Targeted chain rebuilds on the 5 affected buckets (5 parallel
build_chains_with_gemini.py --bucket calls):
  edge/real-time-deadlines                7 chains → 9
  mobile/kv-cache-management              4 chains → 6
  mobile/model-serving-infrastructure     5 chains → 4
  mobile/roofline-analysis                3 chains → 4
  mobile/power-budgeting                  2 chains → 6
                                       21 dropped → 29 added
  net chain count: 835 → 843 (+8)

5 of 6 published questions land in clean primary chains:
  edge-2540    in [edge-0114(L1) → … → edge-2540(L4) → … → edge-0621(L6+)]
  mobile-2152  in [mobile-2152(L2) → mobile-1097(L3) → mobile-1185(L4)]
  mobile-2154  in [mobile-0244(L1) → mobile-0305(L2) → mobile-2154(L4) → mobile-0654(L6+)]
  mobile-2157  in [mobile-0364(L2) → mobile-0537(L3) → mobile-2157(L4) → mobile-0617(L5)]
  mobile-2161  in [mobile-0151(L2) → mobile-0103(L3) → mobile-0581(L4) → mobile-2161(L5) → mobile-1587(L6+)]
mobile-2151 didn't enter a chain — Gemini chose other L4 candidates for
that bucket; mobile-2152 covers the bridge work.

Drive-by: 24 chain_ids renumbered to bucket-tagged form
(<track>-chain-bucket-<topic-slug>-<NN>) to resolve collisions.
build_chains_with_gemini.py's chain_id format uses call_idx, which
restarts at 1 for each --bucket invocation — collides with the
original full-corpus run's IDs and across parallel bucket runs.
Filed as a follow-up to fix the generator (use a content-stable or
bucket-tagged ID scheme).

Verification trail (75 Gemini calls total this batch):
  pre-filter:    30 calls, 21 hallucinated, 9 real (70% hallucination
                 — matches audit-2 measurement exactly)
  generation:    9 calls, 9/9 schema-valid
  audit_math:    9 calls, 9/9 pass (independent re-derivation of all
                 napkin_math arithmetic; 4-way parallel via the new
                 ThreadPoolExecutor in audit_math.py)
  validate_drafts: 27 calls (3 LLM judges × 9 drafts), 6/9 pass
  bucket rebuild: 5 calls, 5 strict-mode chain sets

Validation:
  apply_proposed_chains.py --dry-run: clean (843 chains)
  vault check --strict: 10,709 loaded, 0 invariant failures
  vault build --local-json: published_count=9446, chainCount=843,
    releaseHash=5a4783e62d2ca8d…
2026-05-02 10:54:17 -04:00

30940 lines
872 KiB
JSON

[
{
"chain_id": "cloud-chain-auto-001-01",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2519",
"title": "Cloud Train Serve Split L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2520",
"title": "Cloud Train Serve Split L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2521",
"title": "Cloud Train Serve Split L5 0",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner through edge vs. cloud inference, starting from the basic architecture concept to calculating edge memory footprints, and culminating in a strategic deployment decision.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-02",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L4"
],
"questions": [
{
"level": "L1",
"id": "cloud-0065",
"title": "The Cold Start Penalty",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0009",
"title": "The Blue/Green Memory Budget",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-1020",
"title": "In-Place Rolling Deployment VRAM Saturation",
"bloom": "analyze"
}
],
"rationale": "Explores memory implications during deployments, moving from recognizing basic cold-start disk reads to calculating blue/green memory budgets, and diagnosing VRAM saturation during rolling updates.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-03",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-0066",
"title": "The Canary Memory Footprint",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0942",
"title": "Capacity Planning for Canary Deployments",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0944",
"title": "Canary Traffic Batching Timeouts",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0945",
"title": "Canary Traffic Sizing and Resource Allocation",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating the memory footprint of a canary model to sizing its instance capacity, diagnosing latency spikes during the rollout, and making strategic resource allocation decisions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-04",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1870",
"title": "Sizing Cloud Shadow Deployments",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1871",
"title": "Diagnosing Latency Spikes in Synchronous Shadow Deployments",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1872",
"title": "Architecting Shadow Deployments for Latency-Sensitive APIs",
"bloom": "evaluate"
}
],
"rationale": "Explores the lifecycle of a shadow deployment, from initial capacity sizing to diagnosing synchronous latency spikes, and finally designing a resilient, asynchronous shadowing architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-05",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-0047",
"title": "The Static Batching Latency Tax",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0128",
"title": "The Translation API's Latency Crisis",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0143",
"title": "The SLO vs. Throughput Squeeze",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-0189",
"title": "The Copilot Latency Paradox",
"bloom": "create"
}
],
"rationale": "Builds intuition for static batching latency by decomposing worst-case TTFT, analyzing SLA misses, sizing batches to meet targets, and evaluating structural flaws in static monolithic batches.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-06",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0001",
"title": "The Continuous Batching Target",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0019",
"title": "Continuous Batching and TPOT",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0131",
"title": "The Chatbot Latency Crisis (cloud-0131)",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0212",
"title": "The Continuous Batching Scheduler",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0224",
"title": "The Continuous Batching Starvation",
"bloom": "evaluate"
}
],
"rationale": "Traces the evolution from understanding what continuous batching solves to calculating its latency impacts, contrasting it with static batching, and finally diagnosing request starvation issues.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-07",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2564",
"title": "Cloud Gpu Virtualization L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2565",
"title": "Cloud Gpu Virtualization L3 0",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4369",
"title": "MI300X MIG vs H100 MIG: Multi-Tenant Serving Partitioning",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2566",
"title": "Cloud Gpu Virtualization L5 0",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner from the basic concept of GPU virtualization to practical MIG partitioning, expanding into cross-hardware partitioning challenges (MI300X vs H100), and concluding with a systemic evaluation of sharing mechanisms.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-08",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2570",
"title": "Cloud Red Teaming L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2571",
"title": "Cloud Red Teaming L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2572",
"title": "Cloud Red Teaming L5 0",
"bloom": "evaluate"
}
],
"rationale": "Explores red teaming for ML models, moving from conceptual differentiation from standard evaluation to calculating execution costs and comparing human versus automated strategic approaches.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-09",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3"
],
"questions": [
{
"level": "L2",
"id": "cloud-2653",
"title": "Why Glue Code Dominates ML Systems",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2654",
"title": "Quantifying Glue Code Maintenance Cost",
"bloom": "apply"
}
],
"rationale": "Highlights the concept of ML technical debt by identifying the dominance of glue code and transitioning into quantifying the concrete maintenance cost it incurs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-10",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2668",
"title": "Model Caching in Multi-Model Serving",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2669",
"title": "Model Cache Hit Rate and Latency Impact",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2670",
"title": "Model Caching Strategy Under Cost Constraints",
"bloom": "evaluate"
}
],
"rationale": "Addresses multi-model serving challenges, from understanding the need for model caching to calculating cache hit rates, and finally designing a cost-effective tiered storage architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-11",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2662",
"title": "How Request Pipelining Hides Latency",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2663",
"title": "Pipeline Throughput with Unbalanced Stages",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2664",
"title": "Pipelining vs Batching Trade-off",
"bloom": "evaluate"
}
],
"rationale": "Analyzes inference request pipelining, starting with basic throughput comparisons, evaluating unbalanced stages, and making optimization trade-offs against dynamic batching.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-12",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2665",
"title": "Why Tail Latency Matters at Scale",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2666",
"title": "Hedged Request Latency Improvement",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2667",
"title": "Tail Latency Mitigation Strategy Selection",
"bloom": "evaluate"
}
],
"rationale": "Examines tail latency in ML systems, from calculating fan-out impacts to mathematically estimating hedged request improvements, and evaluating broader mitigation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-13",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2753",
"title": "Alert Fatigue in ML Systems",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2754",
"title": "Alert Consolidation Math",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2752",
"title": "Monitoring Architecture for Multi-Model Fleet",
"bloom": "evaluate"
}
],
"rationale": "Addresses operational ML monitoring, starting with the systemic risk of alert fatigue, calculating the impact of alert consolidation, and designing scalable monitoring architectures for massive fleets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-14",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1777",
"title": "Absorbing Traffic Spikes with CPU Reactive Scaling",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-1775",
"title": "CPU Reactive Burst Scaling vs GPU Pre-provisioning",
"bloom": "evaluate"
}
],
"rationale": "Moves from calculating the raw cost and capacity of CPU spillover instances to a broader architectural decision between reactive CPU scaling and proactive GPU pre-provisioning during traffic spikes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-15",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1114",
"title": "Declarative Autoscaling Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1117",
"title": "Silent Saturation in Declarative Autoscaling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1115",
"title": "Evaluating Declarative API Sidecar Overheads",
"bloom": "evaluate"
}
],
"rationale": "Explores the mechanics of declarative autoscaling, moving from replica calculation to diagnosing metric saturation failures, and concluding with a system-level evaluation of sidecar overheads.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-16",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1963",
"title": "KV-Cache Affinity in Canary Rollouts",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1965",
"title": "Diagnosing Context Amnesia in Canary Rollouts",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1964",
"title": "Evaluating L7 Sticky Routing for LLM Canary Rollouts",
"bloom": "evaluate"
}
],
"rationale": "Examines the impact of cluster topology changes on KV-cache affinity during rollouts, from calculating cache misses to diagnosing state amnesia and designing L7 sticky routing solutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-17",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1827",
"title": "Confidential VM Cold Start Overheads",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1833",
"title": "Diagnosing High TTFT in Confidential GPU Enclaves",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1831",
"title": "Evaluating Confidential Computing for LLM Deployments",
"bloom": "evaluate"
}
],
"rationale": "Evaluates the latency overhead of confidential computing, progressing from modeling cold start decryption to diagnosing runtime TTFT spikes and making an architectural deployment decision.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-18",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0045",
"title": "The Iceberg of Inference Cost",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-1861",
"title": "Calculating the Training vs. Inference Cost Crossover",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1862",
"title": "Analyzing the Inference OpEx Explosion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1860",
"title": "Evaluating Custom Serving Engine ROI",
"bloom": "evaluate"
}
],
"rationale": "Traces the economics of ML deployments, from identifying operational expenses as the dominant cost to calculating the training-inference crossover, diagnosing OpEx explosions, and evaluating the ROI of custom infrastructure.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-19",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-0205",
"title": "The Token Budget Economics",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4363",
"title": "Serving a 70B Model: MI300X Single-Card vs H100 Tensor Parallel",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4378",
"title": "Vendor Lock-In Analysis: TPU v5e vs MI300X vs H100 for a 5-Year Infrastructure Plan",
"bloom": "evaluate"
}
],
"rationale": "Analyzes hardware accelerator choices for LLMs, moving from token economics to specific Tensor Parallelism versus single-card performance comparisons, culminating in a 5-year strategic vendor analysis.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-001-20",
"track": "cloud",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-2282",
"title": "The Activation Sparsity Mirage in MoE",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4377",
"title": "MI300X for Mixture-of-Experts: Expert Capacity and Memory Layout",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-4571",
"title": "MoE Autoscaling Cluster",
"bloom": "create"
}
],
"rationale": "Examines the system challenges of Mixture-of-Experts models, moving from diagnosing latency issues with offloaded experts to optimizing hardware layout and designing auto-scaling clusters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-01",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2180",
"title": "The Ring AllReduce Bandwidth Cost",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0864",
"title": "Ring-AllReduce Transfer Time Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0728",
"title": "The AllReduce Tax",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0740",
"title": "The Gradient Synchronization Overlap",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the basic bandwidth math of Ring AllReduce, to calculating its transfer time, analyzing its tax on a training step, and finally evaluating how to hide it via computation-communication overlap.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-02",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0088",
"title": "The On-Node vs. Cross-Node Latency Jump",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0827",
"title": "The Cross-Node AllReduce Cost",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0486",
"title": "The Two-Node Scaling Cliff: Collective Communication",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2186",
"title": "The Hierarchical AllReduce Asymmetry",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1797",
"title": "Evaluating Ring AllReduce Bottlenecks at Scale",
"bloom": "evaluate"
}
],
"rationale": "Explores the stark latency and bandwidth gap between intra-node and inter-node communication, progressing from raw latency numbers to calculating cross-node costs, diagnosing scaling cliffs, and evaluating hierarchical routing and parameter server architectures at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-03",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-0828",
"title": "The NVLink vs PCIe Tensor Parallel Gap",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0494",
"title": "The Tensor Parallel Scaling Cliff",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0729",
"title": "The Cross-Rack Stall",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0737",
"title": "The NVLink Domain Boundary",
"bloom": "evaluate"
}
],
"rationale": "Teaches the extreme latency sensitivity of Tensor Parallelism, starting with basic NVLink vs PCIe comparisons, diagnosing the scaling cliff when TP crosses server nodes, understanding the topology requirements, and evaluating inference boundaries.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-04",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0867",
"title": "MoE AllToAll Communication Time",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0869",
"title": "Diagnosing MoE AllToAll Network Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1110",
"title": "Trade-offs of DCQCN Parameters in RoCEv2 AI Clusters",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2331",
"title": "MoE Interconnect Bottleneck on TPU Pods",
"bloom": "evaluate"
}
],
"rationale": "Traces the specific networking demands of Mixture of Experts (MoE), from calculating theoretical All-To-All times to diagnosing real-world incast switch drops, tuning RoCEv2 congestion control, and redesigning topologies at massive scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-05",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1192",
"title": "Elephant Flow Collisions with ECMP",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1183",
"title": "Diagnosing ECMP Hash Collisions in RoCEv2 Clusters",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1194",
"title": "Evaluating ECMP vs Adaptive Routing for Elephant Flows",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0557",
"title": "The InfiniBand Adaptive Routing Loop",
"bloom": "create"
}
],
"rationale": "Follows the challenges of routing large elephant flows over Ethernet/IB, moving from understanding ECMP hash collisions to diagnosing them via telemetry, evaluating adaptive routing mitigations, and debugging adaptive routing protocol failures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-06",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2181",
"title": "The Collective Primitive Confusion",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0865",
"title": "Ring AllGather Time in FSDP",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0862",
"title": "Diagnosing FSDP AllGather Topology Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0866",
"title": "Flat vs Hierarchical AllGather",
"bloom": "evaluate"
}
],
"rationale": "Builds deep intuition for Fully Sharded Data Parallelism (FSDP) communication, starting with primitive definitions, calculating AllGather times, diagnosing topology bottlenecks during weight reconstruction, and making architectural choices between flat and hierarchical AllGather.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-07",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1082",
"title": "Local vs Cloud Inference Offloading",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1966",
"title": "Diagnosing Edge-to-Cloud Satellite Storage Saturation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1969",
"title": "Edge-to-Cloud Telemetry Ingestion Architecture",
"bloom": "evaluate"
}
],
"rationale": "Explores the constraints of edge-to-cloud data ingestion and inference, progressing from calculating offload processing times to diagnosing store-and-forward satellite saturation, and evaluating telemetry aggregation architectures for edge fleets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-08",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1762",
"title": "RoCEv2 Traffic Class Allocation with DWRR",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1760",
"title": "Diagnosing RoCEv2 Head-of-Line Blocking During Checkpoints",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1759",
"title": "Evaluating RoCEv2 QoS for Mixed Workloads",
"bloom": "evaluate"
}
],
"rationale": "Teaches the complexities of Quality of Service (QoS) on RoCEv2 fabrics, progressing from basic DWRR traffic class math to diagnosing head-of-line blocking from mixed workloads, and finally evaluating comprehensive QoS and PFC configurations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-002-09",
"track": "cloud",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3726",
"title": "AllReduce Topology Comparison: Ring vs Tree on H100 IB Cluster",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-4364",
"title": "TPU v5e ICI Topology vs H100 NVLink for Data Parallelism",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4370",
"title": "TPU v5e Pod AllReduce Topology for 1T Model Training",
"bloom": "create"
}
],
"rationale": "Analyzes exotic and large-scale topologies, advancing from comparing AllReduce trees on H100 IB to evaluating TPU v5e 2D torus vs H100 performance, and finally mapping 3D parallelism onto a massive 3D TPU torus.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-01",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0957",
"title": "ZeRO-3 Checkpoint Stalls on Lustre",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0958",
"title": "Diagnosing Distributed Checkpoint IO Storm Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0959",
"title": "Mitigating Checkpoint Storms in LLM Training",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4570",
"title": "MI300X Throughput Stalls and Checkpoint I/O",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating checkpoint stall impacts to diagnosing and mitigating IO storms in distributed LLM training architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-02",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1730",
"title": "Calculating Optimal Prefetch Buffer Depth for I/O Jitter",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1731",
"title": "Diagnosing GPU Starvation from P99 I/O Jitter",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1733",
"title": "Evaluating Prefetch Buffer Depth",
"bloom": "evaluate"
}
],
"rationale": "Explores the mathematical, diagnostic, and architectural aspects of using prefetch buffers to hide storage network jitter.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-03",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0821",
"title": "The PCIe Transfer Bottleneck",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-4515",
"title": "Dataloader Thread Blocking",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-4498",
"title": "Data Pipeline Throughput Matching",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4504",
"title": "PCIe Bottleneck in High-Res Image Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2859",
"title": "Multimodal Pipeline Bottlenecks",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4509",
"title": "Distributed Vision Dataloader Design",
"bloom": "create"
}
],
"rationale": "An end-to-end physical progression of image data loading: from PCIe physics, to thread blocking, CPU-GPU matching, PCIe starvation, and finally designing distributed multi-modal dataloaders.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-04",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1773",
"title": "NVMe Random IOPS Bottleneck in Data Loading",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1772",
"title": "Diagnosing Block Storage IOPS Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1774",
"title": "Evaluating Storage Upgrades vs Data Serialization",
"bloom": "evaluate"
}
],
"rationale": "Progresses from raw NVMe IOPS calculations to diagnosing IOPS stalls in cloud block storage and evaluating serialization solutions like WebDataset.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-05",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1874",
"title": "Sizing Cloud Shard Shuffle Buffers",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1873",
"title": "Diagnosing Object Storage Stalls in Global Shuffling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1875",
"title": "Evaluating Global vs. Shard-Level Shuffling for 10TB LLM Training",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2253",
"title": "The Deterministic Global Shuffle at Trillion-Token Scale",
"bloom": "create"
}
],
"rationale": "Explores the trade-offs, bottlenecks, and architectural designs required to shuffle massive datasets securely from object storage to thousands of GPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-06",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1461",
"title": "Calculating KL Divergence for Feature Drift",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1462",
"title": "Diagnosing Silent Accuracy Drops using KL Divergence",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1464",
"title": "Evaluating KL Divergence for High-Throughput Drift Detection",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2307",
"title": "The Diluted Regional Distribution Shift",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating KL divergence metrics to leveraging them for diagnosing drift, scaling them for high throughput, and handling regional distribution shifts.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-07",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1143",
"title": "Calculating Disparate Impact in Cloud Loan APIs",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1960",
"title": "Diagnosing Loan Approval Parity",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1145",
"title": "Evaluating Disparate Impact in Cloud-Based Credit Scoring",
"bloom": "evaluate"
}
],
"rationale": "Moves from computing Disparate Impact to diagnosing Equal Opportunity violations and architecting mitigation strategies under latency SLAs in credit scoring.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-08",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0968",
"title": "Clean-Label Poisoning Ratio Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0967",
"title": "Diagnosing Clean-Label Backdoors in KYC Models",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0969",
"title": "Evaluating Defenses for Clean-Label Poisoning",
"bloom": "evaluate"
}
],
"rationale": "Addresses the math of clean-label poison injection, diagnosing these backdoors in production, and evaluating defenses like DP-SGD.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-09",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1480",
"title": "Real-Time Fraud Label Shift Adaptation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1479",
"title": "Diagnosing Label Shift in E-Commerce Content Moderation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1482",
"title": "Evaluating Mitigation Strategies for Acute Fraud Label Shift",
"bloom": "evaluate"
}
],
"rationale": "Teaches the identification, diagnosis, and real-time mitigation of acute label shift in e-commerce classification models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-10",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1155",
"title": "Calculating the Financial Impact of Feature Drift",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1157",
"title": "Diagnosing Silent Model Degradation in E-Commerce Recommendations",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1158",
"title": "Mitigating Transient Covariate Shift in Recommendations",
"bloom": "evaluate"
}
],
"rationale": "Progresses from estimating the financial impact of drift to identifying silent feature degradation and implementing dynamic retraining logic.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-11",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1515",
"title": "Training Time Estimation with Local NVMe Caching",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1518",
"title": "Diagnosing Low NVMe Cache Hits in Batch Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1517",
"title": "Local NVMe Caching for Multi-Epoch Training",
"bloom": "evaluate"
}
],
"rationale": "Guides learners through estimating training time with caches, diagnosing cache misses in ephemeral jobs, and architecting multi-node caching strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-12",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2240",
"title": "The Shuffle Data Volume Calculation",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2241",
"title": "The Skewed Join Straggler Problem",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2248",
"title": "The Distributed Join Strategy Selection",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-2310",
"title": "Data Skew OOMs in Preprocessing",
"bloom": "analyze"
}
],
"rationale": "Takes learners from fundamental shuffle math to fixing skewed joins, picking join strategies, and mitigating memory explosion at petabyte scales.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-13",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2567",
"title": "Cloud Tokenization L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2568",
"title": "Cloud Tokenization L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2569",
"title": "Cloud Tokenization L5 0",
"bloom": "evaluate"
}
],
"rationale": "Progresses from basic token counts across languages to calculating the memory trade-offs of vocabulary sizes and architecting multi-lingual BPEs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-14",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2129",
"title": "The Training-Serving Skew Trap",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0702",
"title": "The Phantom Performance Drop",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2142",
"title": "The Silent Feature Pipeline Failure",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2304",
"title": "The Silent Redis Feature Skew",
"bloom": "analyze"
}
],
"rationale": "Explores the diagnosis and architecture required to prevent feature computation logic from diverging between batch training and real-time serving.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-15",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2059",
"title": "The Feedback Loop Latency Problem",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2063",
"title": "The Offline-Online Metric Gap",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2070",
"title": "The Pipeline Debt Diagnosis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2078",
"title": "The Experiment-to-Production Gap",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the structural friction in the ML development lifecycle, moving from slow feedback loops to diagnosing and mitigating experiment-to-production disparities.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-16",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1139",
"title": "Subgroup Accuracy Disparities",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1433",
"title": "Diagnosing Hidden Bias in Cloud KYC Pipelines",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1437",
"title": "Intersectional Bias Mitigation in Identity APIs",
"bloom": "evaluate"
}
],
"rationale": "Explores calculating, diagnosing, and mitigating intersectional demographic bias in biometric identity verification systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-17",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0914",
"title": "Quantifying LLM Benchmark Contamination",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0915",
"title": "Diagnosing Sudden MMLU Score Spikes in LLM Pre-training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0918",
"title": "Evaluating Contamination in Code LLMs",
"bloom": "evaluate"
}
],
"rationale": "Progresses from quantifying exact-match contamination to diagnosing score anomalies and architecting robust contamination filters for LLMs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-18",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1388",
"title": "Impact of I/O Jitter on Distributed Data Loading",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1389",
"title": "Diagnosing Distributed I/O Jitter in Synchronous Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1390",
"title": "Mitigating Storage I/O Jitter in Checkpointing",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the mathematical impact of I/O jitter on synchronous distributed training, diagnoses its symptoms, and mitigates it architecturally.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-19",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1282",
"title": "Calculating GPU Feeding Tax for Distributed Vision Training",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1283",
"title": "Diagnosing GPU Starvation in Vision Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1285",
"title": "Evaluating Storage Architecture to Eliminate CV Training Feeding Tax",
"bloom": "evaluate"
}
],
"rationale": "Teaches how to quantify, diagnose, and eliminate data loading bottlenecks and feeding taxes in distributed computer vision workloads.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-003-20",
"track": "cloud",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1920",
"title": "Covariance Pruning for Backdoors",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1923",
"title": "Identifying Poisoned Data via Spectral Signatures",
"bloom": "analyze"
}
],
"rationale": "Applies covariance thresholding math to the practical diagnosis and isolation of poisoned backdoor data using spectral signatures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-01",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2549",
"title": "Cloud Young Daly Formula L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2550",
"title": "Cloud Young Daly Formula L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2551",
"title": "Young-Daly Formula Optimization for Checkpointing",
"bloom": "evaluate"
}
],
"rationale": "Teaches the mathematical foundation of optimal checkpoint intervals and trade-offs between checkpoint speed and hardware MTBF.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-02",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-2842",
"title": "Asynchronous Checkpointing vs Synchronous Checkpointing",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-2726",
"title": "What Causes a Checkpoint Storm",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2727",
"title": "Checkpoint Storm I/O Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1877",
"title": "Diagnosing Sharded Checkpoint Metadata Stalls",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2728",
"title": "Checkpoint Storm Mitigation Strategy",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the basic concept of asynchronous checkpointing to diagnosing and resolving severe filesystem I/O storms at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-03",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0950",
"title": "Centralized Checkpointing Network Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0953",
"title": "Centralized Checkpoint Incast Failure",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0952",
"title": "Evaluating Centralized Checkpointing Bottlenecks",
"bloom": "evaluate"
}
],
"rationale": "Guides learners through the network and storage bottlenecks that emerge when gathering massive LLM state to a single head node.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-04",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0931",
"title": "Byzantine Failures in Distributed GPU Training",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0934",
"title": "Diagnosing Silent Data Corruption in Distributed LLM Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0935",
"title": "Evaluating SDC Mitigations in ZeRO-3 Training",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2801",
"title": "Cloud New 0016",
"bloom": "create"
}
],
"rationale": "Explores the probabilistic reality, diagnosis, mitigation, and architectural prevention of Silent Data Corruption in massive GPU clusters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-05",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-2844",
"title": "Cloud New 0039",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-1189",
"title": "Spot Instance Preemption and Batch Scaling",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1190",
"title": "Diagnosing TorchElastic Spot Preemption Stalls",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1191",
"title": "Evaluating Preemption Overheads in Elastic LLM Training",
"bloom": "evaluate"
}
],
"rationale": "Covers the mechanics of elastic training, calculating new batch constraints after preemption, and evaluating orchestration frameworks for Spot instances.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-06",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2540",
"title": "Cloud Warm Restart L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0973",
"title": "Cold Restart Recovery Time Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0975",
"title": "Diagnosing Cold Restart Read Storms",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0974",
"title": "Evaluating Cold Restart vs Warm Recovery",
"bloom": "evaluate"
}
],
"rationale": "Teaches the difference between cold and warm restarts, calculating recovery times, and diagnosing checkpoint loading bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-07",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1943",
"title": "Stateful KV Cache Recovery Tradeoffs",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1942",
"title": "Cascading KV Cache OOMs in Stateful LLM Serving",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1941",
"title": "Stateful LLM Serving Fault Tolerance",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2810",
"title": "Cloud New 0028",
"bloom": "create"
}
],
"rationale": "Addresses the specific challenges of recovering massive KV caches for long-context LLM sessions after node failures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-08",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2735",
"title": "Failure Domain Concept",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1039",
"title": "Rack-Level Correlated Failures",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1031",
"title": "Diagnosing Correlated GPU Node Failures in a Cluster",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1032",
"title": "Mitigating Correlated Rack Failures in 2048-GPU Training",
"bloom": "evaluate"
}
],
"rationale": "Examines failure domains, probability of rack-level faults, and strategies to prevent highly correlated node drops during distributed training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-09",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1970",
"title": "Federated Learning Over-selection for Stragglers",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1971",
"title": "Diagnosing Cross-Silo Straggler Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1972",
"title": "Evaluating Straggler Mitigation Strategies in Federated Learning",
"bloom": "evaluate"
}
],
"rationale": "Evaluates straggler mitigation in federated networks, from basic probability math to architectural changes like asynchronous updates.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-10",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1243",
"title": "Calculating Expected Calibration Error",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1244",
"title": "Diagnosing Overconfident Predictions in Cloud Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1005",
"title": "Evaluating OOD Rejection Under Strict Latency",
"bloom": "evaluate"
}
],
"rationale": "Teaches how to calculate expected calibration error, diagnose overconfidence, and implement latency-constrained OOD rejection.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-11",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L1",
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0272",
"title": "The GPU Failure Cadence",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0273",
"title": "The Inescapable Cost of Failures",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1255",
"title": "Estimating Cluster Failure Frequency",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-1256",
"title": "Evaluating Optimal Checkpoint Frequency at Scale",
"bloom": "evaluate"
}
],
"rationale": "Builds intuition for massive-scale cluster reliability by transitioning from raw MTTF math to expected failure counts and scale-aware checkpoint intervals.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-12",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1111",
"title": "DLQ Storage and Reprocessing Provisioning",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1112",
"title": "Diagnosing Poison Pill Bottlenecks in Streaming Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1113",
"title": "Evaluating DLQ Architectures for High-Throughput Streams",
"bloom": "evaluate"
}
],
"rationale": "Focuses on designing, diagnosing, and optimizing dead-letter queues to handle poison pill data in high-throughput real-time pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-13",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2777",
"title": "System Prompt Extraction Risk",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2778",
"title": "Injection Detection Accuracy",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2779",
"title": "LLM Injection Defense Architecture",
"bloom": "evaluate"
}
],
"rationale": "Explores LLM prompt security, starting from extraction risks to evaluating the statistical limits and architectural trade-offs of classifier-based defenses.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-14",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1030",
"title": "Hidden Correction Cascades in Bidding Pipelines",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1029",
"title": "Diagnosing Pipeline Calibration Failures",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1034",
"title": "Evaluating Pipeline Correction Cascades",
"bloom": "evaluate"
}
],
"rationale": "Investigates the downstream financial and performance cascades caused by ad-hoc calibration fixes in complex prediction pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-15",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0824",
"title": "The ECC Bit Error Reality",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-1223",
"title": "Cluster MTBF from ECC Uncorrectable Errors",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1224",
"title": "Diagnosing High Correctable ECC Error Rates",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1228",
"title": "Evaluating HBM3 ECC overhead at 24k GPU scale",
"bloom": "evaluate"
}
],
"rationale": "Traces the impact of hardware bit errors from raw frequency rates to cluster MTBF math and architectural decisions about HBM ECC overhead.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-004-16",
"track": "cloud",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1564",
"title": "Prometheus Cardinality Explosion",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1566",
"title": "Diagnosing Observability OOM Cascades",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1567",
"title": "Managing High Metric Cardinality in Global Model Serving",
"bloom": "evaluate"
}
],
"rationale": "Teaches the dangers of high metric cardinality in global ML serving deployments, from basic limitations to diagnosing OOMs and redesigning the TSDB schema.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-01",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0435",
"title": "The Illusion of Sparsity",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3374",
"title": "H100 Performance Bottlenecks with Structured vs. Unstructured Pruning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3719",
"title": "Structured Pruning 50% of MLP Layers in 70B LLM \u2014 MI300X Impact",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3376",
"title": "Optimizing Large Language Model Deployment on AMD MI300X with Structured Sparsity",
"bloom": "analyze"
}
],
"rationale": "Explores why unstructured pruning fails to yield hardware speedups, quantifies the theoretical performance gains of structured pruning, and scales up to architecting a full LLM deployment on sparse compute hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-02",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0928",
"title": "Multi-Dimensional Bin Packing and Stranded GPUs",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0919",
"title": "Multi-Dimensional Resource Fragmentation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0920",
"title": "Evaluating Multi-Dimensional Bin Packing for Stranded Capacity",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2350",
"title": "The Auto-Scaler Fragmentation Deadlock",
"bloom": "analyze"
}
],
"rationale": "Guides the learner from calculating manual bin-packing allocations to diagnosing resource fragmentation and designing heuristic schedulers to resolve cluster-wide deadlocks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-03",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1011",
"title": "Spot Preemption Batch Adjustment",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1012",
"title": "Diagnosing Loss Spikes During Elastic Training Scaling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1013",
"title": "Elastic Scale-Down with Constant Global Batch Size",
"bloom": "evaluate"
}
],
"rationale": "Teaches how to dynamically adjust micro-batch sizes and gradient accumulation to maintain a mathematically constant global batch size during elastic scaling and spot preemptions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-04",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-2840",
"title": "Cloud New 0034",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-1016",
"title": "Calculating Wasted FLOPs from Unfolded Constants",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1014",
"title": "Diagnosing Static Preprocessing Latency",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1017",
"title": "Constant Folding Dense Normalization",
"bloom": "evaluate"
}
],
"rationale": "Introduces the concept of constant folding, calculates the penalty of missing it, diagnoses missed folding opportunities in production pipelines, and quantifies memory bandwidth savings for large-scale normalization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-05",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1051",
"title": "CUDA Graphs for Low-Latency Inference",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1054",
"title": "Analyzing High CPU Utilization in LLM Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1053",
"title": "CUDA Graphs vs Kernel Fusion for LLM Serving",
"bloom": "evaluate"
}
],
"rationale": "Explores the impact of kernel launch overhead on GPU idle time, progressing from basic latency calculations to root-causing CPU bottlenecks and comparing CUDA Graphs against kernel fusion for LLM serving.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-06",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2254",
"title": "The SSA Form Purpose",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1895",
"title": "JIT Compiler SSA Graph Memory Footprint Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1904",
"title": "Debugging SSA Compiler OOM",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1899",
"title": "Evaluating SSA Form for JIT Compiler Optimization",
"bloom": "evaluate"
}
],
"rationale": "Covers the foundational purpose of Static Single Assignment (SSA) form in ML compilers, progressing to calculating and mitigating the severe host memory bloat it causes during ahead-of-time compilation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-07",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2516",
"title": "Cloud Torchdynamo L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2517",
"title": "Cloud Torchdynamo L3 0",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1948",
"title": "Diagnosing Static Graph Compilation OOM",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2518",
"title": "Cloud Torchdynamo L5 0",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner through PyTorch's bytecode tracing mechanism, calculating the penalty of graph breaks, diagnosing memory regressions from dynamic shapes, and making business decisions on compilation versus raw compute.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-08",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2687",
"title": "Physical Limits on Training Cluster Scale",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2685",
"title": "Estimating Total Training FLOPs",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2154",
"title": "The Batch Size Scaling Wall",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2689",
"title": "Scaling Strategy Under Physical Constraints",
"bloom": "evaluate"
}
],
"rationale": "Progresses from physical hardware scaling limits and FLOP math to diagnosing the batch size degradation wall, ultimately requiring strategic infrastructure choices under strict power ceilings.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-09",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2195",
"title": "The Roofline Diagnostic",
"bloom": "apply"
},
{
"level": "L3",
"id": "cloud-1988",
"title": "Roofline Benchmarking of Custom Layers",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1810",
"title": "Diagnosing LLM Decode Inefficiency",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2206",
"title": "The Energy Roofline",
"bloom": "evaluate"
}
],
"rationale": "Introduces the theoretical Roofline model, applies it to custom layers, diagnoses decoding inefficiencies limited by memory bandwidth, and extends the framework to analyze energy efficiency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-10",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2255",
"title": "The Dialect Hierarchy Lowering",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2258",
"title": "The Lowering Pass Fusion Loss",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2262",
"title": "The Tensor Core Codegen Gap",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2265",
"title": "The MLIR Retargetability Boundary",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2267",
"title": "The New Accelerator Compiler Stack",
"bloom": "create"
}
],
"rationale": "Traces the MLIR compilation pipeline from dialect hierarchies and high-level fusion failures down to low-level hardware codegen gaps, culminating in designing a full compiler stack for a novel accelerator.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-11",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L1",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-2841",
"title": "Cloud New 0035",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-2256",
"title": "The Tiling Factor Search Space",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2260",
"title": "The Cost Model vs Profiling Dilemma",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-2804",
"title": "Cloud New 0020",
"bloom": "create"
}
],
"rationale": "Explains the necessity of kernel auto-tuning, measures the combinatorial explosion of the tiling search space, diagnoses cost-model failures on new hardware, and architects custom auto-tuning passes for sparse MoE kernels.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-12",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2021",
"title": "TensorRT Fusion and Quantization Latency",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2023",
"title": "Debugging TensorRT Precision Fallbacks and Fusion Breaks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2022",
"title": "TensorRT Dynamic Shapes Trade-offs",
"bloom": "evaluate"
}
],
"rationale": "Teaches the fundamentals of TensorRT kernel fusion and quantization, debugging precision fallbacks, and navigating the trade-offs of dynamic shapes for latency-sensitive APIs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-13",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1161",
"title": "Scaling Past the Interpolation Threshold",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1160",
"title": "Diagnosing the Interpolation Threshold Error Spike",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1159",
"title": "Evaluating Overparameterization and Double Descent in Vision Models",
"bloom": "evaluate"
}
],
"rationale": "Examines the double descent phenomenon, moving from identifying the interpolation threshold to diagnosing validation error spikes and justifying extreme overparameterization against latency constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-14",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2688",
"title": "Calculating Communication-Compute Overlap Ceiling",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0978",
"title": "Diagnosing Stalled Computation During NCCL AllGather",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0979",
"title": "Evaluating Gradient Bucketing for Overlap",
"bloom": "evaluate"
}
],
"rationale": "Explores the mechanics of hiding network latency, starting with theoretical AllReduce timings, diagnosing overlapping failures in CUDA streams, and optimizing gradient bucketing for distributed training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-15",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2148",
"title": "The Gradient Accumulation Trick",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-1312",
"title": "Diagnosing DDP Overhead in Gradient Accumulation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1313",
"title": "Evaluating Extreme Gradient Accumulation Overhead",
"bloom": "evaluate"
}
],
"rationale": "Introduces the mechanics of gradient accumulation to bypass VRAM limits, then diagnoses communication bottlenecks when combined with DDP, and evaluates extreme accumulation trade-offs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-16",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2632",
"title": "Why nn.Module Uses Parameter Registration",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2633",
"title": "Counting Parameters in a Module Hierarchy",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2634",
"title": "Module Design for Serialization Robustness",
"bloom": "evaluate"
}
],
"rationale": "Explains how parameter registration works under the hood, applies it to calculate total model footprint, and evaluates the robustness of module serialization patterns.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-17",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3"
],
"questions": [
{
"level": "L2",
"id": "cloud-2635",
"title": "Why model.eval() Matters for Inference",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2636",
"title": "BatchNorm Statistics Divergence",
"bloom": "apply"
}
],
"rationale": "Explains the purpose of evaluation mode and traces how identical inputs yield different predictions through BatchNorm statistics divergence.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-18",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2638",
"title": "Pure Functions Enable Composable Transforms",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2639",
"title": "Vectorization Speedup via vmap",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2640",
"title": "Choosing Between Stateful and Functional Paradigms",
"bloom": "evaluate"
}
],
"rationale": "Introduces JAX's functional paradigm, demonstrates vectorization via vmap, and evaluates the strategic trade-offs of switching an entire research pipeline to functional programming.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-005-19",
"track": "cloud",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2272",
"title": "The Framework Dispatch Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2276",
"title": "The Autograd Overhead in Serving",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2274",
"title": "The Caching Allocator Fragmentation Trap",
"bloom": "evaluate"
}
],
"rationale": "Identifies CPU-side dispatch overhead in eager PyTorch, diagnoses hidden memory bloat from autograd during serving, and analyzes caching allocator fragmentation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-01",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3129",
"title": "Recall Transformer Inference FLOPs Formula",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3112",
"title": "Transformer Cost Fluency: FLOPs Estimation from Memory",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3113",
"title": "Transformer Cost Implement: Chinchilla Optimal Model Size Calculation",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3109",
"title": "Transformer Cost Design Optimal Architecture for Inference Budget",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3114",
"title": "Transformer Cost Mastery: Full Training and Inference Cost Analysis for LLM Product",
"bloom": "create"
}
],
"rationale": "Progresses from recalling basic FLOPs formulas to estimating memory, applying Chinchilla scaling laws for optimal sizing, designing inference architecture, and finally building a complete end-to-end financial model for an LLM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-02",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2091",
"title": "Prefill vs Decode Compute Profile",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-1148",
"title": "Diagnosing Bottlenecks in Disaggregated LLM Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2452",
"title": "Prefill and Decode Cluster Disaggregation",
"bloom": "create"
}
],
"rationale": "Explores the compute versus memory-bound phases of LLM generation, progressing to diagnosing bottlenecks in disaggregated deployments and evaluating the latency trade-offs of splitting prefill and decode over the network.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-03",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-2086",
"title": "MoE AllToAll Communication Cost",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2442",
"title": "The MoE Memory Bandwidth Tax",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-2447",
"title": "MoE Routing at High Batch Size",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0672",
"title": "The Router Bottleneck in MoE Serving",
"bloom": "create"
}
],
"rationale": "Follows the systems challenges of MoE models, moving from AllToAll communication math to memory bandwidth limits, batch size routing implications, and resolving router compute bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-04",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1373",
"title": "Edge-Cloud Hierarchical Bandwidth Filtering",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1377",
"title": "Cascading Filter Failure in Hierarchical Pipelines",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2481",
"title": "Edge-Cloud Hybrid Video Break-Even",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2380",
"title": "Tiered Inference for Video Doorbells",
"bloom": "analyze"
}
],
"rationale": "Teaches the design of hierarchical edge-to-cloud pipelines, starting from bandwidth calculations to cascading filter failures, TCO analysis for video streams, and strict power constraints in battery-operated devices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-05",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2084",
"title": "Embedding Table Bandwidth Bottleneck",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0897",
"title": "Bandwidth Taper in DLRM Embeddings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1422",
"title": "Diagnosing Input Stationary Bottlenecks in MLP Workloads",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2095",
"title": "DLRM Embedding Table Sharding Strategy",
"bloom": "analyze"
}
],
"rationale": "Progresses through the systems constraints of DLRM embedding tables, from foundational memory bandwidth bottlenecks to PCIe transfer limits, diagnosing input stationary dataflow issues, and designing multi-GPU sharding strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-06",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1887",
"title": "Silicon Interposer Edge Density Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1889",
"title": "Root-Causing 2.5D Packaging HBM Failures",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1888",
"title": "Evaluating 2.5D Silicon Interposer Trade-offs",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2338",
"title": "Die-to-Die Interconnect Bottleneck in Chiplet ASICs",
"bloom": "analyze"
}
],
"rationale": "Covers the systems engineering of AI chiplets and 2.5D packaging, from basic interposer edge density math to diagnosing physical memory training failures, evaluating TCO trade-offs, and optimizing die-to-die workloads.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-07",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-2087",
"title": "KV Cache Memory Per Token",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2343",
"title": "PCIe Switch Oversubscription in KV Paging",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3132",
"title": "Specify KV Cache Memory Requirements for Long-Context LLM Serving",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-2107",
"title": "KV Cache Quantization Quality-Throughput Frontier",
"bloom": "create"
}
],
"rationale": "Explores the memory systems impact of the KV cache, from calculating per-token size to diagnosing PCIe paging bottlenecks, sizing multi-GPU long-context requirements, and evaluating quantization systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-006-08",
"track": "cloud",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2537",
"title": "Cloud Wafer Scale Engine L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2538",
"title": "Cloud Wafer Scale Engine L3 0",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1787",
"title": "Diagnosing Monolithic Accelerator Fab Rejection",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2539",
"title": "Cloud Wafer Scale Engine L5 0",
"bloom": "evaluate"
}
],
"rationale": "Teaches the systems tradeoffs of monolithic and wafer-scale AI chips vs multi-GPU clusters, from conceptual differences to SRAM capacity, fab yield rejections, and business TCO evaluations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-01",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-0235",
"title": "Handling KV Cache Fragmentation",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1246",
"title": "KV Cache External Fragmentation Constraints",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1247",
"title": "Diagnosing KV Cache External Fragmentation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1248",
"title": "Evaluating KV Cache Allocators for Variable Sequences",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0634",
"title": "The Fragmentation Crisis",
"bloom": "create"
}
],
"rationale": "Explores the problem of KV cache memory fragmentation, starting from the concept and calculating its impact, diagnosing OOMs caused by it, and evaluating allocators like PagedAttention to solve it.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-02",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1681",
"title": "Calculating PagedAttention Memory Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1678",
"title": "Diagnosing PagedAttention Block Size Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2453",
"title": "Paged Attention Block Size Fragmentation",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0781",
"title": "Multi-Turn Gemini LLM Serving with PagedAttention",
"bloom": "create"
}
],
"rationale": "Focuses on the specific mechanics of PagedAttention, calculating its savings, diagnosing bottlenecks related to block sizing, and applying it at massive scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-03",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2555",
"title": "Cloud Zero Optimizations L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2556",
"title": "Cloud Zero Optimizations L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2557",
"title": "Cloud Zero Optimizations L5 0",
"bloom": "evaluate"
}
],
"rationale": "Progresses through the stages of ZeRO optimization, calculating memory footprints and evaluating the communication vs memory trade-offs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-04",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-2226",
"title": "The Checkpoint Bandwidth Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2230",
"title": "The Metadata Storm",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2235",
"title": "The Checkpoint Cascade Failure",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2231",
"title": "The Checkpoint Tiering Strategy",
"bloom": "analyze"
}
],
"rationale": "Explores the I/O and storage bottlenecks of saving massive model checkpoints at scale, culminating in designing a tiered storage architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-05",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2224",
"title": "The Storage Tier Latency Gap",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2228",
"title": "The Object Store Training Anti-Pattern",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2232",
"title": "The I/O Wall at Scale",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2236",
"title": "The I/O Jitter Amplification",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2238",
"title": "The Storage Disaggregation Dilemma",
"bloom": "create"
}
],
"rationale": "Traces data loading and storage tier bottlenecks from basic latency gaps to diagnosing I/O walls and jitter at scale, culminating in architectural storage design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-06",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1734",
"title": "Estimating Prefix Caching Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1736",
"title": "Diagnosing Zero-Hit Prefix Caching",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1738",
"title": "Prefix Caching Trade-offs in Agentic LLM Serving",
"bloom": "evaluate"
}
],
"rationale": "Calculates the benefits of prefix caching, diagnoses cache miss issues in production, and evaluates architectural trade-offs for highly shared prompts.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-07",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1796",
"title": "Validation Loop Memory Leak",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1793",
"title": "Diagnosing Autograd Memory Leaks in Training Loops",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2393",
"title": "Production Debugging of Tensor Memory Leaks",
"bloom": "evaluate"
}
],
"rationale": "Follows the lifecycle of training memory leaks, from understanding validation loop retention to diagnosing autograd graph leaks and debugging them in production.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-08",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0883",
"title": "Autograd Activation Memory vs Recomputation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1792",
"title": "Diagnosing OOM in Reverse Mode Differentiation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1791",
"title": "Custom Autograd vs Reverse-Linked Graph",
"bloom": "evaluate"
}
],
"rationale": "Examines the memory overhead of intermediate activations in autograd, diagnosing OOMs during backward passes, and evaluating custom autograd recomputation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-09",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2582",
"title": "Cloud Qlora L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2583",
"title": "Cloud Qlora L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2584",
"title": "Cloud Qlora L5 0",
"bloom": "evaluate"
}
],
"rationale": "Covers the mechanics of QLoRA, calculating its total memory footprint, and evaluating its trade-offs against full fine-tuning.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-10",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1910",
"title": "DLRM Embedding Sparse Scatter Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1912",
"title": "Diagnosing Low Utilization in DLRM Lookups",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1913",
"title": "Evaluating DLRM Sparse Embedding Placement",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the bandwidth bottlenecks of sparse embedding lookups, diagnosing low SM utilization, and evaluating hybrid cache placements for massive tables.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-11",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0999",
"title": "CXL vs InfiniBand for DLRM Embeddings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1001",
"title": "DLRM Bottleneck with CXL Pooling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2362",
"title": "Diagnosing CXL Cache Thrashing",
"bloom": "evaluate"
}
],
"rationale": "Investigates the use of CXL memory expansion for large models, from basic latency comparisons to diagnosing pooling bottlenecks and cache thrashing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-12",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1288",
"title": "HBM Bandwidth Savings with FlashAttention",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1289",
"title": "Diagnosing FlashAttention Recomputation Optimization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1290",
"title": "Evaluating FlashAttention vs Standard Attention for 32K Context",
"bloom": "evaluate"
}
],
"rationale": "Calculates the HBM bandwidth savings of FlashAttention, diagnoses throughput regressions from recomputation, and evaluates its use for scaling to 32K context windows.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-13",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1663",
"title": "Online Softmax Memory Footprint Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1659",
"title": "Debugging Block-wise Softmax for Long-Context Attention",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1662",
"title": "Evaluating Online Softmax for Long-Context Kernels",
"bloom": "evaluate"
}
],
"rationale": "Explores the memory footprint and implementation of online softmax, diagnosing tiling errors, and evaluating its necessity for massive context windows.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-14",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0002",
"title": "The KV-Cache Memory Hog",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0004",
"title": "The VRAM Cost of Context",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1469",
"title": "Calculate Maximum Batch Size for LLM KV Cache",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1470",
"title": "Diagnosing LLM Serving OOMs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1468",
"title": "Evaluating KV Cache Memory Constraints and PagedAttention",
"bloom": "evaluate"
}
],
"rationale": "Starts with basic KV cache volume calculations, scales up to determining maximum batch sizes, diagnoses production OOMs under concurrency, and evaluates architectural mitigations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-15",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1474",
"title": "Calculating INT8 KV Cache Memory Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1477",
"title": "Diagnosing Output Degradation in INT8 KV Cache",
"bloom": "analyze"
}
],
"rationale": "Calculates the memory savings of quantizing the KV cache to INT8 and diagnoses the resulting output degradation from naive per-tensor approaches.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-16",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L1",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "cloud-0256",
"title": "The 16x VRAM Multiplier for Training",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3851",
"title": "OOM During Optimizer Initialization on A100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2319",
"title": "Mixed Precision Optimizer State Underflow",
"bloom": "analyze"
}
],
"rationale": "Explores the VRAM requirements of optimizer states, diagnosing OOMs during initialization, and debugging underflow issues when casting states to lower precision.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-17",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1753",
"title": "Memory Footprint for Progressive VLM Deployment",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1755",
"title": "Debugging Progressive Deployment Failures Across Tiers",
"bloom": "analyze"
}
],
"rationale": "Calculates the footprint of distilling models for edge devices and diagnoses OOM and accuracy failures across progressive deployment tiers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-18",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-0101",
"title": "The Continuous Batching OOM",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0151",
"title": "The KV Cache Thrashing Cascade",
"bloom": "analyze"
}
],
"rationale": "Examines the memory constraints of continuous batching with paged KV caches and diagnoses cascading failures caused by thrashing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-008-19",
"track": "cloud",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3264",
"title": "VRAM Budgeting for 70B LLM on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4373",
"title": "MI300X vs A100: VRAM Capacity Advantage for Long-Context Serving",
"bloom": "create"
}
],
"rationale": "Analyzes VRAM budgeting for large models on AMD MI300X, comparing its long-context serving capacity advantages against smaller-memory GPUs like the A100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-01",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2552",
"title": "Cloud Zero Copy Serialization L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1783",
"title": "REST API Serialization Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1852",
"title": "Bottleneck Analysis in Python REST/JSON Serving",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1851",
"title": "Evaluating RPC Frameworks for High-Throughput Embeddings",
"bloom": "evaluate"
}
],
"rationale": "Explores the serialization overhead of APIs, progressing from understanding JSON costs to diagnosing latency in REST serving and finally evaluating RPC frameworks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-02",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0238",
"title": "Applying Little's Law to Inference Servers",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0027",
"title": "The Little's Law Bottleneck",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0100",
"title": "The Unstable Translation Queue",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0152",
"title": "The Deadline-Missing Detector",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0163",
"title": "The Black Friday Collapse",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner through applying Little's Law, calculating queue wait times, analyzing single-worker systems, diagnosing SLA failures from high utilization, and addressing non-linear queueing collapse under high load.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-03",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1429",
"title": "Mitigating P99 Jitter with Interrupt Shielding",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1430",
"title": "Diagnosing P99 Inference Jitter from NIC Interrupts",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1431",
"title": "Interrupt Shielding vs NIC Coalescing for P99 Latency",
"bloom": "evaluate"
}
],
"rationale": "Investigates tail latency jitter caused by network interrupts, teaching how to calculate the latency impact, diagnose the root cause, and evaluate mitigation strategies like CPU core isolation versus interrupt coalescing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-04",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1816",
"title": "Latency SLAs for Real-Time Saliency Maps",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1817",
"title": "Explainability Latency Bottleneck on T4 GPUs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1818",
"title": "Explainability Latency Trade-offs",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating the latency overhead of explainability methods to diagnosing unexpected P99 spikes when enabling them, and finally evaluating architectural trade-offs to meet strict SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-05",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1041",
"title": "Calculating Latency Impact of CPU Affinity",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1046",
"title": "Diagnosing P99 Latency Jitter on CPU Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1042",
"title": "Evaluating CPU Pinning for P99 Latency SLA",
"bloom": "evaluate"
}
],
"rationale": "Teaches the impact of CPU thread scheduling on inference latency, guiding the learner through calculating the affinity impact, diagnosing random P99 jitter, and deciding on strict CPU pinning versus throughput.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-06",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2573",
"title": "Cloud Token Budget L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2574",
"title": "Cloud Token Budget L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2575",
"title": "Cloud Token Budget L5 0",
"bloom": "evaluate"
}
],
"rationale": "Explores the system impact of increasing LLM token budgets, from basic memory implications to calculating concurrent request limits and evaluating pool architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-07",
"track": "cloud",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2623",
"title": "Why Micro-Benchmarks Mislead at System Level",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2624",
"title": "Isolating Memory Bandwidth via Micro-Benchmark",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2625",
"title": "Diagnosing Performance with Micro vs Macro Benchmarks",
"bloom": "evaluate"
}
],
"rationale": "Connects hardware micro-benchmarks to system-level performance, highlighting why peak TFLOPS differ from MFU, calculating effective bandwidth, and diagnosing whether hardware or software is the bottleneck.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-08",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2057",
"title": "The Experiment Tracking Storage Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2065",
"title": "The Experiment Metadata Tax",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2068",
"title": "The Silent Training Regression",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2069",
"title": "The Multi-Objective Experiment Frontier",
"bloom": "analyze"
}
],
"rationale": "Builds an understanding of experiment tracking observability, advancing from storage calculations to identifying missing metadata, diagnosing silent regressions, and selecting optimal multi-objective models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-09",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3023",
"title": "MLOps Lifecycle: Implement Model Registry Versioning Strategy for Multi-Region LLM",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3026",
"title": "MLOps Lifecycle: Specify a Model Evaluation Gate for Production Promotion on H100",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3024",
"title": "MLOps Lifecycle: End-to-End MLOps System Design for Production LLM on H100",
"bloom": "create"
}
],
"rationale": "Explores model registry and artifact deployment at scale, moving from registry schema and multi-region replication design, to specifying automated promotion gates, and culminating in end-to-end MLOps architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-10",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2644",
"title": "The ML Test Score Framework",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2645",
"title": "Scoring a Production ML System",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2649",
"title": "When Level 2 MLOps Is Premature",
"bloom": "evaluate"
}
],
"rationale": "Teaches the evaluation of MLOps maturity using the ML Test Score framework, progressing from basic score comprehension to calculating a team's score, and evaluating the business trade-offs of investing in Level 2 infrastructure.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-11",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2744",
"title": "Continuous Training Triggers",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1023",
"title": "Continuous Training Frequency Optimization",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1024",
"title": "Seasonality-Induced Drift Triggers",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1026",
"title": "Continuous Fine-Tuning vs From-Scratch Retraining",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0799",
"title": "The Retraining Math",
"bloom": "create"
}
],
"rationale": "Investigates continuous training under concept drift, going from basic triggers to frequency optimization, diagnosing hyper-active retraining loops, assessing architectural risks, and modeling long-term retraining economics.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-12",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1070",
"title": "Compounding Costs of Data Cascades",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1081",
"title": "Diagnosing Mutable Lineage Failures",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1080",
"title": "Row-Level vs Partition-Level Lineage",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0802",
"title": "The Silent Schema Shift",
"bloom": "evaluate"
}
],
"rationale": "Examines the critical role of data lineage, starting with the compounded costs of data cascades, diagnosing failures from mutable lineage, evaluating tracking granularity, and designing petabyte-scale safeguards.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-13",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1561",
"title": "Small File Metadata Overhead in Object Storage",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1091",
"title": "S3 Small File GPU Starvation Diagnosis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1849",
"title": "Evaluating Sequential Storage Patterns for Distributed Training",
"bloom": "evaluate"
}
],
"rationale": "Addresses the challenge of small-file bottlenecks in training, calculating latency impacts of object storage, diagnosing GPU starvation from S3 dataloaders, and evaluating sequential storage solutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-011-14",
"track": "cloud",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0201",
"title": "The Tokenizer Mismatch",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1066",
"title": "Diagnosing Production Accuracy Collapse in Cloud CV",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2139",
"title": "The Feature Version Mismatch",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-2146",
"title": "The End-to-End Feature Pipeline Redesign",
"bloom": "create"
}
],
"rationale": "Tackles training-serving skew, identifying specific tokenization mismatches, diagnosing offline/online accuracy drops, designing robust feature versioning, and architecting an end-to-end skew-free feature pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-012-05",
"track": "cloud",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0241",
"title": "The Role of HBM Bandwidth",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-1543",
"title": "LLM Inference Memory Bandwidth Calculation: Memory Hierarchy Design",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1562",
"title": "Diagnosing Sublinear Inference Scaling on H100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1545",
"title": "Evaluating GPU Upgrades for LLM Inference",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3731",
"title": "Multi-Level Memory Hierarchy Specification for 405B LLM Inference",
"bloom": "create"
}
],
"rationale": "Progresses from fundamental HBM bandwidth concepts in the roofline model to calculating autoregressive decoding limits, diagnosing sublinear scaling, and specifying multi-tiered infrastructure for massive LLMs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-012-06",
"track": "cloud",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0603",
"title": "The PCIe Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1349",
"title": "Diagnosing LLM Latency Spikes with KV Cache Offloading",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0621",
"title": "The CXL Memory Tier",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0631",
"title": "The Disaggregated Memory Architecture",
"bloom": "create"
}
],
"rationale": "Investigates off-GPU memory access, starting with PCIe transfer latency for model weights, diagnosing KV cache offload spikes, and evaluating CXL disaggregated memory for inference.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-012-07",
"track": "cloud",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2620",
"title": "Training Memory Footprint Breakdown",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2621",
"title": "Will This Model Fit in GPU Memory?",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0612",
"title": "The Gradient Checkpoint Trade-off",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3744",
"title": "Activation Checkpointing Granularity Trade-off for 405B Model Training on H100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0635",
"title": "The Gradient Checkpointing Boundary",
"bloom": "create"
}
],
"rationale": "Guides the learner through training memory footprints, sizing models, evaluating gradient checkpointing trade-offs, and pushing the boundaries of ZeRO-3 on multi-GPU clusters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-012-08",
"track": "cloud",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0264",
"title": "The HBM Latency Penalty",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-0938",
"title": "GPU L1 Cache Tiling for Attention",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0937",
"title": "A100 KV Cache L2 Thrashing",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0939",
"title": "Evaluating IO-Aware Attention Tiling on A100 Caches",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2334",
"title": "FlashAttention SRAM Bank Conflicts",
"bloom": "evaluate"
}
],
"rationale": "Examines the micro-architecture of GPU memory hierarchies, transitioning from the HBM latency penalty to L1/SRAM tiling math, L2 thrashing, and mitigating SRAM bank conflicts in FlashAttention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-012-09",
"track": "cloud",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1550",
"title": "Sizing GPU Clusters for DLRM Embeddings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1549",
"title": "Diagnosing Low Utilization in 100B DLRM",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1554",
"title": "DLRM Tiered Memory Architecture",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the unique memory capacity and bandwidth challenges of Deep Learning Recommendation Models (DLRMs), from sizing embedding clusters to architecting host-DDR5 offload systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-01",
"track": "cloud",
"topic": "mixture-of-experts",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0259",
"title": "The MoE Compute Fallacy",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3806",
"title": "MoE FLOP Efficiency: Why Sparse Models Train Faster",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-2096",
"title": "MoE Sparse vs Dense FLOP Equivalence",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3205",
"title": "MoE vs Dense Scaling Law Crossover",
"bloom": "evaluate"
}
],
"rationale": "Calculates and analyzes the fundamental compute and FLOPs scaling tradeoffs between dense and MoE architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-02",
"track": "cloud",
"topic": "mixture-of-experts",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3798",
"title": "MoE Routing: Top-K Gating and Load Imbalance",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3802",
"title": "Auxiliary Load Balancing Loss: Mechanism and Coefficient",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3194",
"title": "MoE Auxiliary Loss Coefficient Tuning",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3816",
"title": "MoE Gradient Computation: The Router Gradient Challenge",
"bloom": "evaluate"
}
],
"rationale": "Progresses from identifying load imbalance to tuning auxiliary losses and ultimately optimizing the non-differentiable router gradient.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-03",
"track": "cloud",
"topic": "mixture-of-experts",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3358",
"title": "MoE Inference Optimization on NVIDIA A100",
"bloom": "analyze"
},
{
"level": "L3",
"id": "cloud-1577",
"title": "MoE Active Parameter Bandwidth",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3803",
"title": "MoE Inference Latency: The Expert Loading Problem",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2302",
"title": "MoE Expert Parallelism Decode Bottleneck",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2108",
"title": "MoE Inference Memory Efficiency Problem",
"bloom": "create"
}
],
"rationale": "Teaches the memory bandwidth bottlenecks and deployment architectures for serving large MoE models on GPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-04",
"track": "cloud",
"topic": "mixture-of-experts",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3848",
"title": "MoE Capacity Factor Communication Overhead",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3186",
"title": "MoE All-to-All Communication Volume",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3200",
"title": "MoE Fine-Grained Expert Parallelism Communication",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3210",
"title": "MoE Multi-Node Placement Strategy",
"bloom": "create"
}
],
"rationale": "Explores the all-to-all communication overhead of expert routing and strategies for multi-node expert placement.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-07",
"track": "cloud",
"topic": "data-parallelism",
"competency_area": "parallelism",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0699",
"title": "The AllReduce Bottleneck",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-2690",
"title": "Recursive Halving-Doubling vs Ring AllReduce",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1085",
"title": "Ring-AllReduce Communication Overhead Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1088",
"title": "Diagnosing DDP Network Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2692",
"title": "AllReduce Algorithm Selection for Mixed Workloads",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2807",
"title": "Heterogeneous-Bandwidth Data-Parallel AllReduce Architecture",
"bloom": "create"
}
],
"rationale": "Covers the mechanics, algorithmic variants, and architectural optimization of the AllReduce collective across diverse network topologies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-08",
"track": "cloud",
"topic": "data-parallelism",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2699",
"title": "SSP Staleness Bound Intuition",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0936",
"title": "Bulk Synchronous Parallel Straggler Impact",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0933",
"title": "Diagnosing Stragglers in BSP Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0932",
"title": "Evaluating BSP Mitigation Strategies for Stragglers",
"bloom": "evaluate"
}
],
"rationale": "Evaluates the impact of stragglers in distributed training and explores asynchronous and stale-synchronous mitigation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-09",
"track": "cloud",
"topic": "data-parallelism",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2558",
"title": "Cloud Fully Sharded Data Parallel L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2559",
"title": "Cloud Fully Sharded Data Parallel L3 0",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3416",
"title": "Analyzing Data Parallelism Bottlenecks on Google TPU v5e",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2560",
"title": "Cloud Fully Sharded Data Parallel L5 0",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the conceptual foundation of FSDP to calculating communication overhead and optimizing shard placement at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-013-10",
"track": "cloud",
"topic": "data-parallelism",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2702",
"title": "Linear Scaling Rule Intuition",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2703",
"title": "LR Warmup Duration Calculation",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2704",
"title": "When Linear Scaling Breaks Down",
"bloom": "evaluate"
}
],
"rationale": "Explains global batch size scaling limits and learning rate adjustments before addressing convergence failures beyond the critical batch size.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-01",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3883",
"title": "SRAM Calculation for FlashAttention Tiling",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1286",
"title": "Diagnosing SRAM Spills in Tiled Attention",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1287",
"title": "Evaluating Flash Attention Arithmetic Intensity",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating theoretical SRAM tile sizes on A100 to diagnosing actual HBM spills when tiles are too large, to analyzing the overall arithmetic intensity on the same architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-02",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3781",
"title": "IO-Awareness: Roofline Model for Attention Kernels",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-3139",
"title": "FlashAttention Arithmetic Intensity Calculation",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3145",
"title": "FlashAttention-2 vs FlashAttention-1 Parallelism",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3794",
"title": "FlashAttention-2 Warp Partitioning Strategy",
"bloom": "create"
}
],
"rationale": "Explores the core performance modeling of FlashAttention on Hopper GPUs, starting from basic roofline analysis, moving to specific arithmetic intensity calculations, analyzing FA-2 parallelism, and diving into low-level warp partitioning.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-03",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3784",
"title": "FlashAttention Memory Savings Enable Longer Training Contexts",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3779",
"title": "FlashAttention Backward Pass: The Recomputation Trade",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3648",
"title": "Flash Attention Backward Pass Memory Recomputation",
"bloom": "analyze"
}
],
"rationale": "Progresses from the basic activation memory savings enabled by FlashAttention to the mechanics of the backward pass recomputation that allows it, and finally to evaluating the FLOPs tradeoff of recomputation at extreme sequence lengths.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-04",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3776",
"title": "PagedAttention: Virtual Memory for KV Cache",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3135",
"title": "PagedAttention Memory Fragmentation",
"bloom": "apply"
}
],
"rationale": "Introduces the concept of PagedAttention acting as virtual memory for the KV cache to solve fragmentation, then asks for concrete calculations of wasted memory avoided by this approach in a serving scenario.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-05",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3775",
"title": "FlashAttention-2 vs Standard Attention: Wall-Clock Speedup",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3405",
"title": "Optimizing IO-Aware Attention on AMD MI300X for Large Sequence Models",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3408",
"title": "Optimizing Large Language Model Attention on AMD MI300X with IO-Aware Tiling",
"bloom": "analyze"
}
],
"rationale": "Focuses on optimizing FlashAttention specifically for the AMD MI300X architecture, progressing from baseline speedups to adapting tiling strategies for its unique HBM and SRAM characteristics.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-06",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3790",
"title": "Online Softmax: The Numerical Foundation of FlashAttention",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3136",
"title": "Implementing FlashAttention's Online Softmax",
"bloom": "apply"
}
],
"rationale": "Explores the numerical trick that makes FlashAttention possible, starting from the basic running max/sum logic to the implementation details of rescaling blocks when new K tiles arrive.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-07",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3782",
"title": "FlashDecoding: Parallelizing Attention During Inference",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3147",
"title": "FlashDecoding for Long-Context Decode",
"bloom": "analyze"
}
],
"rationale": "Examines how FlashDecoding parallelizes the attention computation across KV blocks during the autoregressive phase to prevent under-utilization at long context lengths.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-08",
"track": "cloud",
"topic": "flash-attention",
"competency_area": "optimization",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3407",
"title": "Diagnosing IO-Bound FlashAttention on TPU v5e",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3780",
"title": "FlashAttention on TPU: XLA Attention vs Pallas Kernels",
"bloom": "evaluate"
}
],
"rationale": "Explores the specific challenges of implementing and diagnosing IO-aware attention on TPU architectures, comparing compiler-generated kernels with custom Pallas implementations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-09",
"track": "cloud",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "cloud-0288",
"title": "The Quantization Memory Dividend",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0266",
"title": "The Inference Memory Diet",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1723",
"title": "LLM Decode Speedup via Weight Quantization",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4545",
"title": "Memory Bandwidth Limits of Large Model Generation",
"bloom": "analyze"
}
],
"rationale": "A foundational progression starting from the theoretical memory footprint reduction of INT8 quantization, applying it to a specific LLM size, and finally projecting the actual token decode speedup gained from that reduced memory bandwidth.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-10",
"track": "cloud",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1905",
"title": "Balancing W8A8 Outliers with SmoothQuant",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1909",
"title": "Diagnosing Accuracy Collapse in INT8 LLM Deployments",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1907",
"title": "Evaluating SmoothQuant for 175B LLM Serving",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0582",
"title": "The Quantization Catastrophe",
"bloom": "create"
}
],
"rationale": "Traces the problem of severe activation outliers in large LLMs quantized to INT8, progressing from calculating the SmoothQuant balancing scale, to diagnosing the collapse, evaluating the strategy at scale, and formulating a robust production architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-11",
"track": "cloud",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-2316",
"title": "Optimizing FP8 Formats for LLM Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2427",
"title": "FP8 Distributed Training Divergence",
"bloom": "evaluate"
}
],
"rationale": "Explores the stability challenges of FP8 training formats, starting with how to map E4M3 and E5M2 to forward/backward passes, and culminating in diagnosing a divergence when E4M3 is misapplied to routing layers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-12",
"track": "cloud",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-1726",
"title": "Diagnosing W8A16 Quantization Regression During Prefill",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1725",
"title": "W8A8 vs W4A16 for LLM Decoding",
"bloom": "evaluate"
}
],
"rationale": "Examines the nuanced performance tradeoffs of hybrid quantization schemes, moving from diagnosing why W8A16 slows down prefill due to dequantization overhead to deciding between W8A8 and W4A16 for memory-bound decoding.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-13",
"track": "cloud",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0638",
"title": "The FP16 vs BF16 Question",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2324",
"title": "BF16 Accumulation Precision Loss in Massive GEMMs",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-0650",
"title": "The FP16 Divergence",
"bloom": "evaluate"
}
],
"rationale": "Investigates the specific numerical tradeoffs between FP16 and BF16, starting with their basic definitions, exposing BF16's accumulation weakness in massive GEMMs, and comparing their training stability over long runs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-014-14",
"track": "cloud",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-2462",
"title": "The Cloud-to-Edge Calibration Shift",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-2485",
"title": "Depthwise Quantization Catastrophe",
"bloom": "analyze"
}
],
"rationale": "Focuses on post-training quantization failures for edge CV models, moving from calibration distribution shifts to the severe accuracy degradation often seen when quantizing depthwise separable convolutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-015-01",
"track": "cloud",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0323",
"title": "The Datacenter Cooling Tax (PUE)",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0336",
"title": "The PUE Tax",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1715",
"title": "Calculating Total Facility Energy for Training Cluster",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1717",
"title": "Diagnosing PUE Degradation in Liquid Cooling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1721",
"title": "Evaluating H100 Cluster Power and Cooling Topologies",
"bloom": "evaluate"
}
],
"rationale": "Teaches the physical meaning of PUE, how to calculate its impact on power and cost, and how to evaluate large-scale cooling retrofit decisions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-015-02",
"track": "cloud",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0269",
"title": "The 700W Question",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0350",
"title": "The Rack Density Limit",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1718",
"title": "Rack-Level Power Wall Calculation for AI Clusters",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3712",
"title": "H100 TDP Management Under Sustained Training Load",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2505",
"title": "Overprovisioning Under Power Caps",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-2395",
"title": "Power-Capped Rack Density Tradeoff",
"bloom": "evaluate"
}
],
"rationale": "Progresses from single-GPU TDP definitions to calculating rack density limits, managing sustained thermal loads, and making cluster-level overprovisioning trade-offs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-015-03",
"track": "cloud",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2528",
"title": "Cloud Transient Loads L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2529",
"title": "Cloud Transient Loads L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2530",
"title": "Cloud Transient Loads L5 0",
"bloom": "evaluate"
}
],
"rationale": "Guides learners through the identification, quantification, and architectural mitigation of transient power loads during synchronized distributed training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-015-04",
"track": "cloud",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1351",
"title": "Profiling Microsecond Kernel Power Draw",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1352",
"title": "Diagnosing Power Discrepancies in Sub-second Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1353",
"title": "Evaluating Discrepancies in GPU Power Measurement Techniques",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the nuances and pitfalls of measuring sub-second GPU power draw for high-frequency micro-batched workloads.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-015-05",
"track": "cloud",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1709",
"title": "Cloud RAPL Power Side-Channel Analysis",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1710",
"title": "Diagnosing Multi-Tenant GPU Power Side-Channel Leaks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1714",
"title": "Mitigating Power Analysis in Multi-Tenant GPU Inference",
"bloom": "evaluate"
}
],
"rationale": "Teaches how multi-tenant power telemetry can leak sensitive model architecture data and how to architect mitigations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-015-06",
"track": "cloud",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1234",
"title": "Energy Savings with Event-Driven Activation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1236",
"title": "Diagnosing GPU Power in Event-Driven SNNs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1235",
"title": "Evaluating Event-Driven Activation for Cloud Video Analytics",
"bloom": "evaluate"
}
],
"rationale": "Evaluates the energy savings, hardware utilization bottlenecks, and systemic trade-offs of event-driven architectures for continuous video analytics.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-01",
"track": "cloud",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0089",
"title": "The RAG Latency Trap: Compound AI Systems",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-2032",
"title": "The RAG Latency Composition",
"bloom": "apply"
},
{
"level": "L3",
"id": "cloud-2040",
"title": "The Reranker Bottleneck Inversion",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2043",
"title": "The DAG Critical Path Optimization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2052",
"title": "The Compound System Tail Latency Amplification",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3502",
"title": "Optimizing Multi-Model RAG Latency on NVIDIA A100",
"bloom": "analyze"
}
],
"rationale": "Progresses from identifying the dominant latency source in simple RAG pipelines to diagnosing reranker bottlenecks, optimizing DAG critical paths, and mitigating tail latency amplification in complex compound systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-02",
"track": "cloud",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "cloud-0077",
"title": "The FP16 Inference Memory Footprint",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0306",
"title": "The RAG Pod Memory Footprint",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2041",
"title": "The Multi-Model GPU Packing Problem",
"bloom": "analyze"
}
],
"rationale": "Explores the GPU memory footprint of serving models, starting from single LLM weights to multi-model compound systems packed onto a single GPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-03",
"track": "cloud",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2030",
"title": "The Embedding Index Memory Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2042",
"title": "The Product Quantization Memory Tradeoff",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2045",
"title": "The Vector DB Sharding Strategy",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-2051",
"title": "The Embedding Model Drift Crisis",
"bloom": "create"
}
],
"rationale": "Examines the system challenges of scaling vector databases, from calculating raw RAM requirements to applying product quantization, sharding large indices, and managing embedding model upgrades at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-04",
"track": "cloud",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2039",
"title": "The Embedding Cache Hit Rate Cliff",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2049",
"title": "The Semantic Cache Collision Problem",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-2044",
"title": "The RAG Cache Invalidation Dilemma",
"bloom": "analyze"
}
],
"rationale": "Guides the learner through the complexities of RAG caching, starting with hit-rate degradation, addressing semantic collisions, and designing robust invalidation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-05",
"track": "cloud",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2038",
"title": "The Agent Tool-Call Latency Budget",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2047",
"title": "The Agent Loop Cost Explosion",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-2048",
"title": "The Tool-Use Timeout Cascade",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the operational realities of autonomous agents, covering latency budgeting for tool calls, managing the token cost explosion in agent loops, and handling tool timeout cascades.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-09",
"track": "cloud",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0293",
"title": "The Core Motivation for Federated Learning",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0329",
"title": "The TCO of Privacy: Centralized vs. Federated",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0451",
"title": "The Privacy-TCO Trade-off",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2616",
"title": "Federated Learning ROI Under Regulatory Constraints",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the conceptual privacy motivation of federated learning to calculating basic TCO, performing long-term break-even analyses, and evaluating complex ROI under strict regulatory constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-016-10",
"track": "cloud",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-0360",
"title": "The Federated Learning Data Bill",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2612",
"title": "Federated Learning Communication Cost",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3567",
"title": "Federated Averaging Optimization on NVIDIA A100 for Non-IID Data",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3568",
"title": "Optimizing Federated Learning on NVIDIA H100 for Non-IID Edge Devices",
"bloom": "analyze"
}
],
"rationale": "Guides the learner through the communication bottlenecks of federated learning, starting with basic ingress sizing and scaling up to optimizing FedAvg on cloud GPUs for massively distributed, non-IID edge networks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-018-01",
"track": "cloud",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0359",
"title": "The CapEx Baseline",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0408",
"title": "The TCO of an H100",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-3096",
"title": "TCO Analyze: Why Cloud GPUs May Be Cheaper Than On-Prem for Startups",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0833",
"title": "The TCO Per Token Analysis",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3101",
"title": "TCO Mastery: Build vs Buy Decision for LLM Training Infrastructure",
"bloom": "create"
}
],
"rationale": "Guides the learner from identifying basic hardware unit costs to evaluating complex build-versus-buy infrastructure decisions over a multi-year lifecycle.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-018-02",
"track": "cloud",
"topic": "tco-cost-modeling",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3128",
"title": "TCO Fluency: Compute Cost per Token at Scale",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-3106",
"title": "TCO Realization: Annual GPU Cost for GPT-4-Scale Service",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3098",
"title": "TCO Design: Cost Per Inference for Production API Service",
"bloom": "create"
}
],
"rationale": "Teaches the economic mechanics of serving models at scale, moving from per-token cost estimation to designing a profitable, margin-aware inference API.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-018-03",
"track": "cloud",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-0809",
"title": "The Spot Instance Gamble",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3097",
"title": "TCO Design: Spot vs On-Demand vs Reserved Instance Strategy",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3594",
"title": "Optimizing LLM Deployment TCO on NVIDIA A100: Spot vs. Reserved Instance Strategy",
"bloom": "analyze"
}
],
"rationale": "Explores the economics and operational trade-offs of using spot versus reserved instances, culminating in a comprehensive multi-year deployment strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-019-01",
"track": "cloud",
"topic": "kv-cache-management",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4526",
"title": "Llama-3 70B KV Cache Sizing",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3215",
"title": "KV-Cache Size Calculation for GQA Models",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3236",
"title": "KV-Cache Pool Sizing for Throughput Optimization",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3222",
"title": "KV-Cache Disaggregation for Prefill-Decode Split",
"bloom": "create"
}
],
"rationale": "Progresses from basic KV cache sizing to GQA calculations, multi-GPU pool sizing, and finally distributed prefill-decode disaggregation for Llama 3 70B.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-019-02",
"track": "cloud",
"topic": "kv-cache-management",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1670",
"title": "Calculating Paged KV Cache Capacity",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1672",
"title": "Diagnosing Low Batch Size in Paged KV Cache",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1673",
"title": "Evaluating Paged KV Cache Block Sizes",
"bloom": "evaluate"
}
],
"rationale": "Explores Paged KV Cache management, moving from capacity calculation to diagnosing fragmentation OOMs, and culminating in block size optimization for variable workloads.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-01",
"track": "cloud",
"topic": "rdma-transport",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4059",
"title": "InfiniBand Architecture Fundamentals",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4060",
"title": "RoCE v2 vs InfiniBand for ML Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3636",
"title": "InfiniBand vs Ethernet Cost-Performance Analysis for H100 Scale-Out",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3637",
"title": "RDMA Transport Mastery: End-to-End Network Design for 4096-H100 Training",
"bloom": "create"
}
],
"rationale": "Progresses from fundamental InfiniBand concepts to evaluating interconnect alternatives at scale, culminating in the end-to-end network design of a massive 4096-GPU cluster.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-02",
"track": "cloud",
"topic": "rdma-transport",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4062",
"title": "Kernel Bypass and Verbs API Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3873",
"title": "Evaluating RDMA Kernel Bypass for Distributed Clusters",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3462",
"title": "Optimizing Distributed LLM Training with RDMA on NVIDIA A100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3466",
"title": "Optimizing Distributed Training Communication with RDMA on NVIDIA H100",
"bloom": "analyze"
}
],
"rationale": "Explores the performance benefits of kernel bypass and GPUDirect RDMA, evolving from basic overhead analysis to maximizing communication throughput for All-Reduce operations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-03",
"track": "cloud",
"topic": "rdma-transport",
"competency_area": "networking",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-4407",
"title": "RDMA Queue Pair Limits and Scalability in Large H100 Pods",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-4081",
"title": "RDMA Connection Scaling for 10K GPU Clusters",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the scaling limits of RDMA Queue Pairs, transitioning from node-level exhaustion issues to sub-quadratic connection architectures for 10K+ GPU clusters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-04",
"track": "cloud",
"topic": "disaggregated-serving",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4291",
"title": "Prefill-Decode Split Rationale",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4293",
"title": "TTFT vs TPOT SLO Tension",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-4315",
"title": "Disaggregated Serving Cost Model",
"bloom": "create"
}
],
"rationale": "Evaluates the architecture and economics of disaggregated serving, starting with the rationale for splitting pools and advancing to the SLO tensions and cost models that determine optimal sizing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-05",
"track": "cloud",
"topic": "disaggregated-serving",
"competency_area": "deployment",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-4292",
"title": "KV-Cache Transfer Bandwidth Budget",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4303",
"title": "Network Topology for KV Transfer at Scale",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-4461",
"title": "Disaggregated Serving Tail Latency Root Cause Framework",
"bloom": "analyze"
}
],
"rationale": "Traces the challenges of KV-cache transfer from initial bandwidth calculations to network topology design and end-to-end tail latency debugging.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-06",
"track": "cloud",
"topic": "disaggregated-serving",
"competency_area": "deployment",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-4301",
"title": "Decode Preemption and KV Swap",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4442",
"title": "Fault Tolerance for In-Flight Requests During Decode Node Failure",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-4463",
"title": "Disaggregated Serving Graceful Degradation Under Partial Failure",
"bloom": "evaluate"
}
],
"rationale": "Explores the reliability of the decode pool, moving from individual sequence preemption to node-level fault tolerance and finally cluster-wide graceful degradation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-07",
"track": "cloud",
"topic": "sustainability-carbon-accounting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2783",
"title": "Embodied Carbon Concept",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2784",
"title": "Fleet Lifecycle Carbon Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1502",
"title": "Hardware Refresh Carbon ROI",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2785",
"title": "Upgrade vs Extend Hardware Lifecycle",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4354",
"title": "Lifecycle Carbon Analysis of a 3B Model Serving System",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the fundamental definition of embodied carbon to complex lifecycle analyses of hardware refresh cycles and end-to-end model serving emissions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-08",
"track": "cloud",
"topic": "sustainability-carbon-accounting",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0951",
"title": "Carbon-Aware Workload Shifting",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3747",
"title": "Carbon-Aware Scheduling: Shifting Training to Low-Carbon Hours",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-0835",
"title": "The Carbon-Aware Scheduling Tradeoff",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0812",
"title": "The Carbon-Aware Scheduler",
"bloom": "create"
}
],
"rationale": "Examines the mechanics and trade-offs of carbon-aware job scheduling, evolving from simple temporal shifting to complex multi-region spatial arbitrage.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-09",
"track": "cloud",
"topic": "sustainability-carbon-accounting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2780",
"title": "WUE Metric Explained",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3751",
"title": "Water Usage Effectiveness: The Hidden Cost of Evaporative Cooling",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-3609",
"title": "WUE and Water Efficiency Trade-off for Evaporative Cooling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2782",
"title": "Cooling Strategy for Water-Scarce Region",
"bloom": "evaluate"
}
],
"rationale": "Explores the critical role of Water Usage Effectiveness (WUE) in datacenter sustainability, progressing from basic definitions to complex cooling strategy decisions in constrained environments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-10",
"track": "cloud",
"topic": "congestion-control",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3833",
"title": "PFC Congestion Spreading and Cluster Throughput Collapse",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-1006",
"title": "Debugging RoCEv2 Congestion Spreading in a Clos Network",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1010",
"title": "RoCEv2 Fabric-Wide PFC Congestion Spreading",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3480",
"title": "H100 Cluster Congestion Control and Network Optimization for Distributed ML",
"bloom": "analyze"
}
],
"rationale": "Explores the systemic risks of Priority Flow Control (PFC) in RoCEv2 fabrics, moving from basic congestion spreading mechanics to fabric-wide debugging and comprehensive mitigation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-11",
"track": "cloud",
"topic": "congestion-control",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4009",
"title": "ECMP Hash Polarization in Fat-Tree Topologies",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4019",
"title": "Weighted ECMP for Heterogeneous Link Speeds",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4033",
"title": "ECMP vs Adaptive Routing Tradeoff Space",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4023",
"title": "Congestion-Aware Adaptive Routing on InfiniBand",
"bloom": "evaluate"
}
],
"rationale": "Addresses the limitations of static ECMP load balancing in fat-tree networks, progressing through hashing imbalances to dynamic and adaptive routing solutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-12",
"track": "cloud",
"topic": "congestion-control",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3478",
"title": "H100 Cluster Congestion Control with DCQCN",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3476",
"title": "Congestion Control in AMD MI300X GPU Clusters",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3682",
"title": "DCQCN Tuning for Large-Scale All-Reduce",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-4028",
"title": "Congestion Control at 800G and Beyond",
"bloom": "evaluate"
}
],
"rationale": "Focuses on tuning Data Center Quantized Congestion Notification (DCQCN), covering its foundational mechanisms, cluster-wide tuning, and the future challenges of scaling congestion control to 800Gbps.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-13",
"track": "cloud",
"topic": "feature-store-management",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-2130",
"title": "The Batch vs Real-Time Feature Tradeoff",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4201",
"title": "Point-in-Time Correctness for Training",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3697",
"title": "Feature Skew Between Training and Serving",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4213",
"title": "Feature Store Consistency Guarantees",
"bloom": "evaluate"
}
],
"rationale": "Explores the fundamental challenge of training-serving skew, moving from batch vs. real-time architectural tradeoffs to point-in-time correct joins and guaranteeing cross-store consistency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-14",
"track": "cloud",
"topic": "feature-store-management",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2127",
"title": "The Feature Store Latency Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-4202",
"title": "Feature Store Serving Throughput Under Load",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2138",
"title": "The Feature Store Hot Key Problem",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2143",
"title": "The Feature Serving Latency Decomposition",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2983",
"title": "Feature Store: Mastery \u2014 Design Feature Store for Multi-Modal Real-Time LLM",
"bloom": "create"
}
],
"rationale": "Addresses the performance limits of online feature stores, progressing from basic latency budgets and throughput sizing to mitigating hot keys and decomposing end-to-end serving delays.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-020-15",
"track": "cloud",
"topic": "feature-store-management",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3698",
"title": "Feature Store Freshness SLAs for Time-Sensitive Models",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4200",
"title": "Feature Freshness vs Staleness Budget",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2135",
"title": "The Feature Freshness vs Cost Tradeoff",
"bloom": "analyze"
}
],
"rationale": "Focuses on the operational and economic realities of feature freshness, balancing staleness budgets against the infrastructure costs required to maintain real-time updates.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-01",
"track": "cloud",
"topic": "datacenter-efficiency",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2940",
"title": "Recall: What PUE Measures and Why 1.0 Is Physically Impossible",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3745",
"title": "PUE Reality Check: Cooling Overhead in a Hyperscale Cluster",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2950",
"title": "Diagnosing Unexpected PUE Spike in H100 Cluster",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3749",
"title": "Stranded Power: When GPU Utilization Tanks PUE",
"bloom": "evaluate"
}
],
"rationale": "Progresses from defining the baseline PUE metric to calculating operational overhead, diagnosing an unexpected real-world spike, and finally adjusting PUE calculations for workload utilization constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-02",
"track": "cloud",
"topic": "datacenter-efficiency",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3876",
"title": "Rack Power Budgeting for H100 vs A100 Servers",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3347",
"title": "NVIDIA H100 Rack Power and PUE Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3606",
"title": "Cooling Bottleneck Analysis for Dense H100 GPU Rack",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2945",
"title": "Implementing Power-Aware Bin Packing for Heterogeneous GPU Racks",
"bloom": "apply"
}
],
"rationale": "Builds from basic rack power budgeting to evaluating grid impact at high densities, identifying CRAC/PDU bottlenecks, and implementing a power-aware bin-packing algorithm for heterogeneous clusters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-03",
"track": "cloud",
"topic": "datacenter-efficiency",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3755",
"title": "Thermal Throttling: When GPUs Self-Protect",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3611",
"title": "Power Capping Impact on Training Throughput for MI300X",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-4397",
"title": "MI300X OAM Module Thermal Interface and Cooling Validation",
"bloom": "apply"
}
],
"rationale": "Investigates individual GPU thermal throttling behaviors, moves to understanding non-linear throughput losses from cluster-wide power caps, and concludes with designing thermal safety monitoring policies for dense modules.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-04",
"track": "cloud",
"topic": "scheduling-resource-management",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3442",
"title": "NVIDIA H100 Multi-tenancy with MIG for Resource Sharing",
"bloom": "analyze"
},
{
"level": "L3",
"id": "cloud-4115",
"title": "GPU Time-Slicing vs MIG vs MPS",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-4111",
"title": "MIG Partitioning for Inference Multiplexing",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3446",
"title": "Multi-Tenant H100 GPU Scheduling for Cloud ML",
"bloom": "analyze"
}
],
"rationale": "Introduces hardware-level multi-tenancy with MIG, compares it theoretically against software time-slicing, applies it practically to bin-pack 12 concurrent models, and scales to a cluster-wide mixed workload scheduler.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-05",
"track": "cloud",
"topic": "scheduling-resource-management",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1296",
"title": "Distributed LLM Training Deadlock and Gang Scheduling",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3687",
"title": "Gang Scheduling for Distributed Training Efficiency",
"bloom": "understand"
},
{
"level": "L5",
"id": "cloud-1299",
"title": "Evaluating Gang Scheduling for LLM Training",
"bloom": "evaluate"
}
],
"rationale": "Identifies the cause of distributed training deadlocks resulting in NCCL timeouts, demonstrates the massive utilization loss from partial cluster allocations, and designs an all-or-nothing scheduling strategy to resolve it.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-06",
"track": "cloud",
"topic": "scheduling-resource-management",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4109",
"title": "GPU Scheduling: FIFO vs Shortest-Job-First",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2218",
"title": "The Opportunistic Training Checkpoint Race",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4114",
"title": "GPU Preemption for Priority Inference",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2222",
"title": "The Multi-Tenant Starvation Cascade",
"bloom": "create"
}
],
"rationale": "Starts with queue wait times under basic policies, explores safe preemption protocols for opportunistic jobs, designs dynamic preemption for urgent inference, and ultimately resolves complex starvation cascades across multiple tenants.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-07",
"track": "cloud",
"topic": "scheduling-resource-management",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2211",
"title": "The Backfill Scheduling Gap",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3685",
"title": "Bin-Packing GPU Jobs to Minimize Fragmentation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4125",
"title": "Fragmentation-Aware Scheduling Policy",
"bloom": "evaluate"
}
],
"rationale": "Teaches the fundamentals of backfill scheduling to reclaim idle gaps, calculates the overhead cost of sub-optimal bin-packing, and architects a fragmentation-aware scheduler to guarantee contiguous GPU blocks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-08",
"track": "cloud",
"topic": "batching-strategies",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "cloud-0036",
"title": "The Head-of-Line Blocking Problem",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0236",
"title": "Advantages of Continuous Batching",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-0109",
"title": "The TPOT Trade-off",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3710",
"title": "Continuous Batching vs Static Batching Throughput on H100 for LLM Serving",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the definition of head-of-line blocking to the mechanics of continuous batching, calculates the theoretical compute waste of static padding, and quantifies real-world throughput gains on enterprise hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-09",
"track": "cloud",
"topic": "batching-strategies",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-0053",
"title": "The Static Batching Waiting Game",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0916",
"title": "Calculating Maximum Batching Window",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1166",
"title": "Debugging Dynamic Batching Latency Spikes",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1170",
"title": "Dynamic Batching for Strict Latency SLOs",
"bloom": "evaluate"
}
],
"rationale": "Explores the wait-time penalty of static queues, introduces dynamic queue delay limits for strict SLAs, diagnoses burst-induced latency spikes, and strictly optimizes maximum batch and queue configurations for tight budgets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-10",
"track": "cloud",
"topic": "batching-strategies",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0112",
"title": "The Prefill-Decode Collision",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0146",
"title": "The Continuous Batching Tail Latency Paradox",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2454",
"title": "Chunked Prefill for Latency Jitter",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-0188",
"title": "The 'Laggy' Code Assistant: A Batching Design Challenge",
"bloom": "create"
}
],
"rationale": "Identifies the core prefill-decode collision problem, investigates the resulting tail latency paradox from long prompts, implements chunked prefill to isolate computation, and architects a serving system around strict TTFT/TPOT SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-11",
"track": "cloud",
"topic": "container-orchestration",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2927",
"title": "K8s GPU Device Plugin: Why Pod Requests Whole GPUs",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-4138",
"title": "K8s Resource Limits vs Requests for GPU Workloads",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3658",
"title": "Resource Quota and GPU Memory Oversubscription in Kubernetes",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3660",
"title": "Custom Kubernetes Scheduler for GPU Memory-Aware Placement",
"bloom": "create"
}
],
"rationale": "Progresses from K8s device plugin mechanics to understanding limit/request asymmetry, diagnoses OOM issues caused by integer-only GPU tracking, and specifies a custom scheduler that allocates workloads via free VRAM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-12",
"track": "cloud",
"topic": "container-orchestration",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3834",
"title": "Kubernetes Pod Affinity and GPU Bandwidth",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3868",
"title": "K8s Network Architecture for PyTorch DDP",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-4142",
"title": "RDMA and Host Networking for ML Training Pods",
"bloom": "analyze"
}
],
"rationale": "Analyzes the impact of pod affinity on NVLink topologies, evaluates Kubernetes CNI options for RDMA support, and debugs host-networking bottlenecks limiting NCCL throughput across the cluster.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-13",
"track": "cloud",
"topic": "container-orchestration",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2935",
"title": "Realizing Cluster Autoscaler Behavior with GPU Node Groups",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4140",
"title": "GPU Node Autoscaling in Kubernetes",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-2933",
"title": "Evaluating Horizontal Pod Autoscaler for GPU Inference Scaling",
"bloom": "evaluate"
}
],
"rationale": "Identifies the physical provisioning delay of cluster autoscalers, designs capacity buffers to mask these cold starts during traffic spikes, and evaluates the limitations of standard HPA signals for bursty inference.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-14",
"track": "cloud",
"topic": "recommendation-systems-engineering",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4316",
"title": "DLRM Architecture Overview",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4317",
"title": "TB-Scale Embedding Table Sharding",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4475",
"title": "All-to-All Communication Optimization for Distributed DLRM",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-4330",
"title": "Model Parallelism Strategy for 100TB Embedding Cluster",
"bloom": "create"
}
],
"rationale": "Builds from basic DLRM architecture pathways to sharding multi-terabyte embedding tables, mitigating the resulting all-to-all network bottlenecks, and scaling to a massive 100TB cluster-wide training strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-15",
"track": "cloud",
"topic": "recommendation-systems-engineering",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4478",
"title": "Request Deduplication in High-Frequency Recommendation Serving",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4335",
"title": "Request Deduplication and Result Caching in Rec Serving",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4323",
"title": "Social-Scale Serving QPS and Caching Strategy",
"bloom": "evaluate"
}
],
"rationale": "Starts with handling immediate request-level deduplication, extends to maintaining a serving result cache for recurring user contexts, and scales to multi-tier predictive caching for a 500M user platform.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-16",
"track": "cloud",
"topic": "recommendation-systems-engineering",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4484",
"title": "Precision vs Recall Tradeoff in Multi-Stage Retrieval Pipeline",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-4466",
"title": "Multi-Stage Ranking Latency Budget Decomposition",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4332",
"title": "Serving Latency vs. Model Freshness Tradeoff",
"bloom": "evaluate"
}
],
"rationale": "Connects basic retrieval recall metrics to strict multi-stage latency budgeting, and navigates the operational tradeoff between ensuring model freshness via synchronous updates and preserving latency SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-021-17",
"track": "cloud",
"topic": "recommendation-systems-engineering",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4334",
"title": "Embedding Dimension Selection and Capacity",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4486",
"title": "Embedding Table Memory Bandwidth Optimization with Mixed Precision",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4338",
"title": "Embedding Table Hot Row Replication for Serving",
"bloom": "evaluate"
}
],
"rationale": "Scales from sizing embedding dimensions to fit within VRAM constraints, optimizes bandwidth with mixed-precision quantization, and resolves hot-shard serving bottlenecks via strategic row replication.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-022-03",
"track": "cloud",
"topic": "communication-computation-overlap",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3924",
"title": "Overlapping Backward Pass with AllReduce in Data-Parallel Training",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-3925",
"title": "Gradient Bucket Size Tuning for Optimal Communication Overlap",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3930",
"title": "Overlap Efficiency Degradation at Scale: 8 GPUs vs 256 GPUs",
"bloom": "analyze"
}
],
"rationale": "Explores the mechanics of Data Parallel AllReduce overlap, from basic PyTorch DDP behavior to diagnosing efficiency degradation at massive cluster scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-022-04",
"track": "cloud",
"topic": "communication-computation-overlap",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3940",
"title": "Diagnosing Failed Overlap: NCCL Blocking on Compute Stream",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3933",
"title": "CUDA Stream Priority for Communication vs Computation Scheduling",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3926",
"title": "NCCL Async Operations and CUDA Stream Orchestration for Overlap",
"bloom": "apply"
}
],
"rationale": "Focuses on CUDA stream orchestration, moving from diagnosing kernel blocking to manually scheduling async NCCL operations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-022-05",
"track": "cloud",
"topic": "communication-computation-overlap",
"competency_area": "parallelism",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3938",
"title": "Overlapping KV Cache Transfer with Decode Computation in Serving",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3963",
"title": "Prefill-Decode Disaggregation Overlap in Production LLM Serving",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-3952",
"title": "Compute-Communication Overlap in Distributed Inference with Speculative Decoding",
"bloom": "create"
}
],
"rationale": "Examines disaggregated serving overlap, scaling from single-node KV cache transfers to cross-network speculative decoding synchronization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-022-06",
"track": "cloud",
"topic": "model-tensor-parallelism",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4086",
"title": "Why TP Stays Within a Node",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2015",
"title": "Diagnosing MFU Collapse in Cross-Node Tensor Parallelism",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3664",
"title": "Tensor Parallelism Communication Bottleneck on Multi-Node Cluster",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-4108",
"title": "TP + Pipeline Parallelism Interaction",
"bloom": "create"
}
],
"rationale": "Addresses the challenges of cross-node Tensor Parallelism, moving from identifying network bottlenecks to designing 3D parallelism topologies for massive models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-022-07",
"track": "cloud",
"topic": "model-tensor-parallelism",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3888",
"title": "Calculate Tensor Parallel Communication Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4085",
"title": "TP Communication Volume for Transformer Block",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4097",
"title": "Backward Pass Communication in TP",
"bloom": "analyze"
}
],
"rationale": "Teaches the mathematical foundation of Tensor Parallelism overhead, progressing from simple communication time to calculating per-block volume during backward passes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-01",
"track": "cloud",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0822",
"title": "The Kernel Fusion Memory Savings",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-1329",
"title": "Estimating Graph Fusion Bandwidth Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0676",
"title": "The Inference Compiler Optimization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0988",
"title": "Evaluating Operator Fusion in Computational Graphs",
"bloom": "evaluate"
}
],
"rationale": "Explores the memory and latency benefits of operator fusion, from basic calculation to evaluating full layers on production hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-02",
"track": "cloud",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1339",
"title": "Group DRO Weight Update Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1340",
"title": "Debugging Group DRO Training Instability",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1341",
"title": "Evaluating Group DRO for Rare Medical Subgroups",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating Group DRO weights to debugging its instability and evaluating its system-level feasibility at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-03",
"track": "cloud",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1798",
"title": "Mitigating Outliers with Robust Loss",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1799",
"title": "Diagnosing Gradient Explosions from Corrupted Cloud Data",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1801",
"title": "Evaluating Robust Loss for Noisy Recommender Embeddings",
"bloom": "evaluate"
}
],
"rationale": "Explores handling data outliers, moving from loss modification to diagnosing gradient explosions and evaluating architectural trade-offs for 50B recommenders.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-04",
"track": "cloud",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L1",
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-2838",
"title": "Cloud New 0032",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-3392",
"title": "H100 Graph Optimization Strategies for LLMs",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-1453",
"title": "JIT Recompilation Spikes in Dynamic Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1458",
"title": "Evaluating JIT Compilation for Dynamic Shapes",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2792",
"title": "Cloud New 0005",
"bloom": "create"
}
],
"rationale": "Covers the transition from PyTorch JIT/AOT basics to handling dynamic shape recompilation spikes and architecting a hybrid compiler.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-05",
"track": "cloud",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1331",
"title": "Calculating Latency Overhead of PyTorch Graph Breaks",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1325",
"title": "Diagnosing torch.compile Graph Breaks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1326",
"title": "Evaluating MoE Routing Graph Breaks in PyTorch 2.x",
"bloom": "evaluate"
}
],
"rationale": "Focuses on PyTorch 2.x torch.compile, beginning with calculating graph break latency overhead, diagnosing their root causes, and evaluating fixes in complex MoE models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-06",
"track": "cloud",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1336",
"title": "Graph Tracing Control Flow OOM",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1338",
"title": "Silent Failures in Traced Dynamic Control Flow",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1337",
"title": "Evaluating Graph Tracing Failures in Dynamic Routing Models",
"bloom": "evaluate"
}
],
"rationale": "Examines the pitfalls of graph tracing with dynamic control flow, moving from OOMs to silent failures and evaluating compilation strategies for dynamic routing models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-11",
"track": "cloud",
"topic": "gradient-synchronization",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4034",
"title": "Ring AllReduce Bandwidth Formula",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4035",
"title": "Ring vs Tree AllReduce Latency Tradeoff",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3619",
"title": "AllReduce Algorithm Selection for Heterogeneous Bandwidth Topology",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-4054",
"title": "AllReduce Algorithm Selection in NCCL",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the fundamental Ring AllReduce formula to evaluating its tradeoff with Tree AllReduce, scaling up to heterogeneous topology selection and NCCL algorithm configuration.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-12",
"track": "cloud",
"topic": "gradient-synchronization",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4041",
"title": "Gradient Quantization to INT8",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4036",
"title": "Gradient Compression with Top-K Sparsification",
"bloom": "create"
},
{
"level": "L5",
"id": "cloud-4046",
"title": "PowerSGD Low-Rank Gradient Compression",
"bloom": "create"
}
],
"rationale": "Explores gradient compression techniques to alleviate network bottlenecks, starting with simple INT8 quantization, moving to Top-K sparsification, and evaluating low-rank PowerSGD projection.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-13",
"track": "cloud",
"topic": "gradient-synchronization",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4037",
"title": "Bucket Fusion in NCCL AllReduce",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4042",
"title": "AllReduce Communication Hiding with Computation Overlap",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2870",
"title": "Gradient Sync Bucket Spec",
"bloom": "evaluate"
}
],
"rationale": "Teaches the mechanics of hiding AllReduce latency, from fusing gradient buckets to calculating overlap bounds and specifying optimal bucket sizes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-14",
"track": "cloud",
"topic": "gradient-synchronization",
"competency_area": "parallelism",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-4038",
"title": "Diagnosing Gradient Staleness in Async SGD",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1323",
"title": "Evaluating Sync vs Async Gradient Strategies",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4057",
"title": "DiLoCo: Distributed Low-Communication Training",
"bloom": "evaluate"
}
],
"rationale": "Examines the challenges of scaling beyond synchronous AllReduce, diagnosing staleness in Async SGD, evaluating sync vs async trade-offs, and adopting extreme low-communication methods like DiLoCo.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-15",
"track": "cloud",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3062",
"title": "Pipeline Parallelism Recall: GPipe Bubble Overhead Formula",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3756",
"title": "GPipe Bubble Overhead: The Pipeline Efficiency Tax",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3054",
"title": "Pipeline Parallelism Bubble Overhead Comparison: GPipe vs PipeDream",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3119",
"title": "Design 1F1B Pipeline Schedule for LLM Training on H100 Cluster",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3763",
"title": "Pipeline Bubble in Zero-Bubble Scheduling",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the theoretical GPipe bubble formula to comparing it with 1F1B scheduling, designing custom schedules, and exploring advanced zero-bubble techniques.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-16",
"track": "cloud",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1692",
"title": "Minimizing Pipeline Bubble Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1696",
"title": "Diagnosing High Pipeline Bubble Overhead",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3120",
"title": "Design Interleaved Pipeline Schedule for Reduced Bubble",
"bloom": "create"
}
],
"rationale": "Focuses on optimizing pipeline utilization by tuning micro-batches, diagnosing bubble-induced stalls, and implementing interleaved virtual stages to halve the bubble.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-023-17",
"track": "cloud",
"topic": "pipeline-parallelism",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3693",
"title": "Micro-batch Size Selection Under Memory Constraints",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-3061",
"title": "Pipeline Parallelism Realization: Activation Memory with Gradient Checkpointing",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3762",
"title": "Pipeline Parallelism Activation Checkpointing Interaction",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3694",
"title": "Combining Pipeline and Tensor Parallelism for 1T Parameter Model",
"bloom": "create"
}
],
"rationale": "Explores the memory constraints of pipeline parallelism, moving from micro-batch sizing to incorporating gradient checkpointing and architecting a 3D parallel system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-024-13",
"track": "cloud",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3066",
"title": "Queueing Theory Recall: Little's Law in Inference Systems",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3597",
"title": "M/M/1 Queue Utilization and Mean Wait on A100 Serving Node",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3067",
"title": "Queueing Theory Analyze: Why Tail Latency Explodes at High Utilization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2847",
"title": "Operating Point on the Queueing Hockey-Stick",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4516",
"title": "LLM Queueing Wait Time",
"bloom": "create"
}
],
"rationale": "Teaches fundamental M/M/1 dynamics, starting from Little's Law, progressing through the exponential explosion of tail latency at high utilization, and concluding with variance reduction techniques for LLM serving.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-024-14",
"track": "cloud",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3866",
"title": "Diagnosing High Inference Latency on A100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3600",
"title": "Head-of-Line Blocking in LLM Decode Queue on MI300X",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3603",
"title": "Work-Conserving Scheduler Analysis for Mixed Priority on MI300X",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3604",
"title": "Queueing Theory Mastery: SLA-Driven Capacity Planning for H100 Fleet",
"bloom": "create"
}
],
"rationale": "Focuses on the impact of service time variance in LLM queues, moving from basic latency diagnosis to identifying head-of-line blocking, implementing priority schedulers, and performing capacity planning under heavy-tailed G/G/c distributions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-024-15",
"track": "cloud",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3316",
"title": "LLM Inference Queue Management on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3599",
"title": "M/M/c Queue Design for TPU v5e Serving Pool",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3074",
"title": "Queueing Theory Implement: Erlang-C for Inference Cluster Sizing",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-3075",
"title": "Queueing Theory Mastery: Inference System Capacity Planning End-to-End",
"bloom": "create"
}
],
"rationale": "Explores multi-server queuing dynamics (M/M/c), progressing from single-node capacity planning to basic TPU pool sizing, formal Erlang-C mathematical modeling, and finally end-to-end data center capacity planning for MoE models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-07",
"track": "cloud",
"topic": "3d-parallelism",
"competency_area": "parallelism",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0687",
"title": "The Optimizer Explosion",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0511",
"title": "The FSDP vs DDP Memory Trade-off",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0683",
"title": "The ZeRO-1 Memory Squeeze",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-0722",
"title": "The ZeRO-3 Cross-Node Thrashing",
"bloom": "create"
}
],
"rationale": "Explores parameter sharding and its memory tradeoffs, starting from optimizer state explosion, moving to FSDP vs. DDP, diagnosing ZeRO-1 OOMs, and resolving cross-node network thrashing with ZeRO-3.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-08",
"track": "cloud",
"topic": "3d-parallelism",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3431",
"title": "3D Parallelism Sizing \u2014 Adam State + Activations on A100s",
"bloom": "analyze"
},
{
"level": "L3",
"id": "cloud-3880",
"title": "Calculate Model State Memory per GPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3432",
"title": "Optimizing Frontier Model Training with 3D Parallelism on NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3435",
"title": "Designing 3D Parallelism for Frontier Models on NVIDIA H100",
"bloom": "analyze"
}
],
"rationale": "Teaches the dimensional sizing of 3D parallelism, beginning with simple model state memory per GPU and scaling up to architecting hybrid parallelism strategies for trillion-parameter models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-09",
"track": "cloud",
"topic": "speculative-decoding",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-0237",
"title": "Mechanism of Speculative Decoding",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3414",
"title": "Optimizing Speculative Decoding on NVIDIA A100 for LLM Inference",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3160",
"title": "Speculative Decoding Acceptance Rate Fundamentals",
"bloom": "remember"
},
{
"level": "L5",
"id": "cloud-3182",
"title": "Speculative Decoding Break-Even Analysis",
"bloom": "analyze"
}
],
"rationale": "Introduces the core mechanics of speculative decoding, moving through diagnosing low acceptance rates on A100, calculating token yield fundamentals, and mathematically defining the break-even latency point.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-10",
"track": "cloud",
"topic": "speculative-decoding",
"competency_area": "latency",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3166",
"title": "Speculative Decoding Memory Bandwidth Analysis",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3167",
"title": "Tree-Structured Speculation vs Linear Speculation",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3184",
"title": "Staged Speculation for Ultra-Long Generation",
"bloom": "create"
}
],
"rationale": "Investigates advanced speculation topologies, comparing basic memory bandwidth amplification to tree-structured drafting, and deploying staged speculation for ultra-long context generations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-11",
"track": "cloud",
"topic": "speculative-decoding",
"competency_area": "latency",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3180",
"title": "Speculative Decoding Impact on Time-to-First-Token",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3161",
"title": "Speculative Decoding Throughput Degradation Under Load",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3172",
"title": "Speculative Decoding with Continuous Batching",
"bloom": "create"
}
],
"rationale": "Analyzes latency and throughput degradation when speculative decoding scales under load, solving time-to-first-token regressions, batch size conflicts, and continuous batching integration.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-15",
"track": "cloud",
"topic": "tail-latency",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1999",
"title": "Fan-out Tail Latency Probability",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2002",
"title": "Diagnosing Parallel Fan-out Tail Latency",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2000",
"title": "Evaluating Hedged Requests for Fan-Out Inference",
"bloom": "evaluate"
}
],
"rationale": "Covers tail latency amplification in fan-out microservices, teaching the probability of missing SLAs, diagnosing scatter-gather degradation, and implementing hedged requests as a systemic mitigation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-16",
"track": "cloud",
"topic": "tail-latency",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4148",
"title": "Little's Law for GPU Inference Throughput",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4155",
"title": "Queueing Theory Applied to GPU Batch Scheduling",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4160",
"title": "Goodput vs Throughput Under Tail Constraints",
"bloom": "evaluate"
}
],
"rationale": "Connects queueing theory to GPU inference throughput, using Little's Law to predict concurrency limits, analyzing M/D/1 delays for batch scheduling, and balancing maximum throughput against P99 goodput constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-025-17",
"track": "cloud",
"topic": "tail-latency",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4152",
"title": "Prefill-Decode Latency Decomposition",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3874",
"title": "Disaggregated Prefill vs Hedged Requests for TTFT",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-4405",
"title": "Disaggregated Prefill-Decode Architecture for Tail Latency Control",
"bloom": "create"
}
],
"rationale": "Explores the unique tail latency challenges of LLMs by decomposing prefill versus decode times, evaluating disaggregated architectures against hedged requests, and designing a full KV-cache transfer system to isolate long-prompt disruptions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-026-07",
"track": "cloud",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1267",
"title": "Sizing a Non-Blocking Fat-Tree Network",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1268",
"title": "Diagnosing Bisection Bandwidth Drops in a GPU Cluster",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1269",
"title": "Non-blocking vs Oversubscribed Fat-Tree",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4181",
"title": "Multi-Tenant Network Isolation on Shared Clusters",
"bloom": "create"
}
],
"rationale": "Guides the learner from basic Fat-Tree switch counting and sizing, to diagnosing routing collisions causing bisection drops, evaluating huge-scale oversubscription trade-offs, and isolating tenants on shared topologies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-026-08",
"track": "cloud",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1767",
"title": "Sizing Leaf Switches for Rail-Optimized Clusters",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1770",
"title": "Diagnosing Stragglers in Multi-Node TP Fabrics",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1771",
"title": "Evaluating Rail-Optimized Topologies for Cross-Node TP",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4523",
"title": "MI300X Rail-Optimized MoE",
"bloom": "create"
}
],
"rationale": "Focuses specifically on the 'rail-optimized' topology design pattern, from leaf switch sizing to diagnosing performance regressions when replaced with ECMP, evaluating cross-node Tensor Parallelism, and extending the design to Mixture of Experts routing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-026-09",
"track": "cloud",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-4170",
"title": "Torus Topology and Nearest-Neighbor Communication",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4169",
"title": "Dragonfly Topology for Large-Scale Training",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4183",
"title": "Adaptive Routing in Dragonfly Under Adversarial Traffic",
"bloom": "create"
}
],
"rationale": "Explores advanced, hierarchical topologies beyond standard Fat-Trees, moving from calculating 3D Torus bisection bandwidth to selecting Dragonfly vs Fat-Tree for 3D parallelism, and mitigating adversarial minimal-routing collisions in production.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-05",
"track": "cloud",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0892",
"title": "Multi-AZ Link Buffer Exhaustion",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0893",
"title": "Geo-Distributed Training Throughput Collapse",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0900",
"title": "Evaluating WAN Links for Multi-Datacenter LLM Training",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating buffer requirements for WAN links, to diagnosing throughput collapse over those links, to evaluating and mapping parallelism strategies across multi-datacenter setups.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-06",
"track": "cloud",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0923",
"title": "Calculate Cluster Bisection Bandwidth",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0924",
"title": "Diagnosing 4:1 Network Oversubscription in LLM Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0926",
"title": "Evaluating Topologies for MoE All-to-All",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2365",
"title": "Fat-Tree Oversubscription Tradeoff",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner from calculating basic bisection bandwidth, to diagnosing oversubscription stalls, evaluating topologies, and making multi-million dollar tradeoff decisions for cluster fabrics.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-07",
"track": "cloud",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-3044",
"title": "Network Bandwidth Bottlenecks: Recall PCIe and NVLink Bandwidth Specs",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3049",
"title": "Network Bandwidth Bottlenecks: Realize PCIe Bandwidth Impact on Model Checkpointing",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3042",
"title": "Network Bandwidth Bottlenecks: Compare PCIe 4.0 vs. 5.0 Impact on H100 Data Loading",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3041",
"title": "Network Bandwidth Bottlenecks: Design NVLink vs. PCIe Topology for 8xH100 Training",
"bloom": "create"
}
],
"rationale": "Explores the intra-node interconnect hierarchy, moving from basic specs to checkpointing bandwidth impacts, evaluating data loading across PCIe generations, and mapping topologies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-08",
"track": "cloud",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3045",
"title": "Network Bandwidth Bottlenecks: Implement Ring-AllReduce Time Formula for H100 Cluster",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3043",
"title": "Network Bandwidth Bottlenecks: Compare AllReduce over NVLink vs. InfiniBand",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3052",
"title": "Network Bandwidth Bottlenecks: Master PCIe+NVLink Communication Overlap Strategy",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3046",
"title": "Network Bandwidth Bottlenecks: Master Full Communication Analysis for 3D-Parallel LLM Training",
"bloom": "evaluate"
}
],
"rationale": "Focuses on analyzing AllReduce communication, progressing from time formulas to comparing interconnects, overlapping communication, and performing full 3D-parallelism analysis.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-23",
"track": "cloud",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0845",
"title": "Calculate Activation Checkpointing Memory Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2879",
"title": "Designing a Gradient Checkpointing Segment Strategy for a 70B Model on A100",
"bloom": "create"
},
{
"level": "L5",
"id": "cloud-3277",
"title": "Optimizing Large Model Training on NVIDIA A100 with Gradient Checkpointing",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3281",
"title": "Optimizing Large Model Training with Gradient Checkpointing on A100",
"bloom": "analyze"
}
],
"rationale": "Progresses from calculating basic memory savings of checkpointing, to designing segment strategies on A100, optimizing large model setups, and tuning the complex compute-memory tradeoff at billions of parameters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-24",
"track": "cloud",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3831",
"title": "Analyzing Training Time Overhead of Gradient Checkpointing",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3274",
"title": "Analyzing Compute-Memory Tradeoffs with Gradient Checkpointing on NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2881",
"title": "Evaluating Gradient Checkpointing Overhead on Training Throughput for GPT-Scale Models",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2884",
"title": "Deriving Optimal Checkpoint Intervals",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the computational cost of checkpointing, moving from analyzing flat time overheads to modeling compute-memory tradeoffs on H100, evaluating throughput drops at scale, and mathematically deriving optimal intervals.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-25",
"track": "cloud",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-3275",
"title": "TPU v5e Activation Checkpointing Strategy for Large Language Models",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3279",
"title": "TPU v5e Activation Memory & Gradient Checkpointing for LLMs",
"bloom": "analyze"
}
],
"rationale": "Explores the unique challenges of activation checkpointing for massive models on TPUs, from evaluating basic strategies to integrating them within strict 16 GB HBM limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-027-26",
"track": "cloud",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0418",
"title": "The 70B Inference Footprint",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0242",
"title": "The 70B Parameter Litmus Test",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2878",
"title": "Estimating Activation Memory for a Transformer Layer on H100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-0159",
"title": "The Coding Assistant's Latency Crisis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2888",
"title": "Realizing Selective Layer Activation Storage for Inference-Time KV Cache vs. Training Activations",
"bloom": "apply"
}
],
"rationale": "Tracks the transition from static memory allocation to dynamic inference challenges, starting with footprint basics, to layer activations, diagnosing latency from KV cache, and comparing inference vs training storage.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-01",
"track": "edge",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1784",
"title": "Sparsity Support on Google Coral Edge TPU",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1960",
"title": "Pruning MobileNetV2 for Google Coral Edge TPU Deployment",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2536",
"title": "Diagnosing Zero Latency Gains from Unstructured Pruning on Coral TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1957",
"title": "Optimizing Edge Vision Models: Structured Pruning for Coral TPU",
"bloom": "analyze"
}
],
"rationale": "Guides the learner from the basic misconception that unstructured pruning improves latency on Edge TPUs, through diagnosing the hardware reasons for failure, to implementing and optimizing structured pruning for the Coral architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-02",
"track": "edge",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1955",
"title": "Pruning Techniques and Hardware Alignment on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1580",
"title": "Ampere 2:4 Structured Sparsity on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1961",
"title": "Optimizing Vision Model Latency with Structured Pruning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1487",
"title": "Architecting Structured Sparsity on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Explores the progression of pruning on Ampere GPUs, starting from basic hardware alignment and Ampere 2:4 sparsity, to troubleshooting latency bottlenecks from unstructured pruning, and finally designing a comprehensive structured sparsity strategy for a complex sensor fusion model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-03",
"track": "edge",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-1956",
"title": "Hailo-8 Pruning Strategy for Efficient Edge Deployment",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1959",
"title": "Optimizing LLM Deployment on Hailo-8 via Structured Pruning",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1962",
"title": "Optimizing LLM Deployment on Hailo-8 via Pruning and Sparsity",
"bloom": "analyze"
}
],
"rationale": "Advances from understanding basic pruning trade-offs on dataflow architectures to optimizing structured sparsity patterns and holistically evaluating LLM deployment on the Hailo-8 accelerator.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-04",
"track": "edge",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-0823",
"title": "Diagnosing On-Device Catastrophic Forgetting",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0824",
"title": "Mitigating Catastrophic Forgetting on Edge",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0862",
"title": "Mitigating Catastrophic Forgetting from Extreme Staleness",
"bloom": "analyze"
}
],
"rationale": "Examines the challenge of catastrophic forgetting on edge devices, progressing from diagnosing the root cause on single smart cameras to evaluating memory-constrained mitigations and finally designing reconciliation protocols for extremely stale distributed models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-05",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0130",
"title": "The CPU-Free Camera Ingest",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0132",
"title": "The High-Speed Camera's DMA Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0439",
"title": "The Sensor Fusion Stall",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0660",
"title": "The DMA Contention Blind Spot",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0587",
"title": "The Sensor Fusion PCIe Trap",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0618",
"title": "The 4D Radar Fusion Bottleneck",
"bloom": "create"
}
],
"rationale": "Explores data movement bottlenecks, scaling from single-camera DMA ingest to architecting complex heterogeneous sensor fusion over PCIe.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-06",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "architecture",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0114",
"title": "The ViT Memory Wall on the Edge",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0162",
"title": "The Edge Ridge Point",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0476",
"title": "The Perception Deadline Miss",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2540",
"title": "The Attention Bandwidth Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0582",
"title": "The Efficient Transformer Paradox",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0621",
"title": "The Autonomous Valet Shrink Ray",
"bloom": "create"
}
],
"rationale": "Evaluates the compute and memory trade-offs of migrating from CNNs to Vision Transformers on resource-constrained edge hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-07",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0186",
"title": "The Perception Deadline",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0283",
"title": "The Real-Time Perception Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0009",
"title": "The Frame Budget",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0017",
"title": "The Pipeline Overlap",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0572",
"title": "The TDA4VM Vision Pipeline",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0628",
"title": "The ADAS Look-Ahead Dilemma",
"bloom": "create"
}
],
"rationale": "Teaches edge software pipelining, moving from sequential latency budgets to orchestrating complex, overlapping multi-model pipelines on heterogeneous cores.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-08",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-0266",
"title": "The Unstable Perception Queue",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0659",
"title": "Memory for Multi-Camera Tracking",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0665",
"title": "The Object Tracking Memory Budget",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0682",
"title": "The LiDAR Point Cloud Memory Explosion",
"bloom": "evaluate"
}
],
"rationale": "Demonstrates how system memory demands scale from simple frame queues to maintaining state across multiple cameras and massive LiDAR point clouds.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-001-09",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-0097",
"title": "The RTOS Interconnect Crisis",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0530",
"title": "The Preempt-RT Kernel Tick Overhead",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0025",
"title": "The RTOS vs RT-Linux Tradeoff",
"bloom": "evaluate"
}
],
"rationale": "Examines the system-level overheads of operating systems in edge environments, contrasting RTOS IPC, Linux preempt-RT patches, and overall architectural tradeoffs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-017-01",
"track": "edge",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "edge-1224",
"title": "Federated Averaging Bottleneck on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1720",
"title": "Mitigating Federated Communication Bottlenecks on Jetson Orin",
"bloom": "analyze"
}
],
"rationale": "Teaches how to diagnose and mitigate communication and synchronization bottlenecks in federated learning on edge nodes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-017-02",
"track": "edge",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-2105",
"title": "Federated Learning on Hailo-8 Edge Devices: Power-Efficient Convergence with Non-IID Data",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1472",
"title": "Federated Learning Architecture on Hailo-8 Dataflow Accelerators",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1654",
"title": "Hybrid On-Device FL using Hailo-8 Dataflow Accelerator",
"bloom": "create"
}
],
"rationale": "Explores the power, architectural, and dataflow tradeoffs of deploying federated learning on inference-only edge accelerators.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-017-03",
"track": "edge",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "edge-0267",
"title": "The Cellular Data Tax",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0343",
"title": "The Federated Fleet TCO: Federated Learning",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0455",
"title": "The Over-Budget Fleet Update",
"bloom": "apply"
}
],
"rationale": "Guides learners through calculating the total cost of ownership (TCO) of data transmission for autonomous fleets and evaluating the financial impact of federated learning.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-017-10",
"track": "edge",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1199",
"title": "Edge Transformer Fluency: Memory-Bandwidth-Bound Decode Calculation",
"bloom": "understand"
},
{
"level": "L4",
"id": "edge-1717",
"title": "Optimizing KV Cache for Long-Context on Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1201",
"title": "Optimize KV Cache Quantization for Jetson Orin Memory Budget",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0884",
"title": "KV Cache Quantization Cast Overhead",
"bloom": "analyze"
}
],
"rationale": "Progresses through the sizing, optimization, and architectural limits of managing long-context KV caches on edge devices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-017-11",
"track": "edge",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1193",
"title": "Recall Transformer Inference Latency on Hailo-8",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1198",
"title": "Diagnose Decode Latency Regression from KV Cache Format on Hailo-8",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1678",
"title": "Hailo-8 Streaming KV-Cache Architecture for Long-Context Transformers",
"bloom": "create"
}
],
"rationale": "Analyzes the dataflow challenges and architectural workarounds required when deploying long-context transformers on memory-constrained edge accelerators.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-017-12",
"track": "edge",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "edge-1594",
"title": "Transformer Attention Cost on Coral Edge TPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0944",
"title": "Edge New 0005",
"bloom": "analyze"
}
],
"rationale": "Examines how scaling sequence lengths affects transformer execution bottlenecks on edge TPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-018-05",
"track": "edge",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1098",
"title": "Edge Queueing Theory Recall: Little's Law at the Network Edge",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1114",
"title": "Edge Queueing Theory Implement: Little's Law for Edge Buffer Sizing",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1120",
"title": "Edge Queueing Theory Realization: Concrete Queue Depth for Coral Edge TPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1124",
"title": "Edge Queueing Theory Specification: Design Smart Meter Edge Queue",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-1115",
"title": "Edge Queueing Theory Mastery: End-to-End Edge Inference System Design",
"bloom": "create"
}
],
"rationale": "Applies Little's Law from simple latency estimation to concrete buffer sizing and ultimately the end-to-end design of an edge inference gateway.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-018-06",
"track": "edge",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-2487",
"title": "Hailo-8 M/D/1 Queuing",
"bloom": "apply"
},
{
"level": "L2",
"id": "edge-2420",
"title": "Transient Queue Buildup from Stochastic Arrivals",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2363",
"title": "Orin YOLOv8 Queue Limit",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1109",
"title": "Edge Queueing Theory Evaluation: Static vs Dynamic Batching on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-2416",
"title": "Coral TPU Batching Effect on Queue Stability",
"bloom": "evaluate"
}
],
"rationale": "Examines deterministic (M/D/1) queuing dynamics in edge accelerators, focusing on transient queue buildups and the impact of batching on tail latency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-018-07",
"track": "edge",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2434",
"title": "Drop-Oldest Queue Policy",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2425",
"title": "Heuristic Priority Queuing for Edge Vision",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1102",
"title": "Edge Queueing Theory Design: Burst Traffic Handling with Jetson Buffer",
"bloom": "create"
},
{
"level": "L5",
"id": "edge-1125",
"title": "Edge Queueing Theory Specification: Multi-Tenant Edge Inference SLO",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-2373",
"title": "Priority Queue Multi-Camera Scheduling",
"bloom": "create"
}
],
"rationale": "Teaches the design of edge queuing policies, transitioning from simple drop-oldest heuristics to complex, multi-tenant priority scheduling for safety-critical pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-018-08",
"track": "edge",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0161",
"title": "The Roofline Ridge Point",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0346",
"title": "The Edge Roofline: Calculating the Ridge Point",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0407",
"title": "Roofline Inference Latency on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0916",
"title": "MobileNet vs VGG NPU Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0589",
"title": "The Edge Efficiency Paradox",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0632",
"title": "The Autonomous Vehicle Power-Performance Paradox",
"bloom": "create"
}
],
"rationale": "Builds foundational roofline concepts (ridge point) into latency estimation, model comparison, and finally, system-level power-performance architectural decisions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-018-09",
"track": "edge",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-0295",
"title": "The Deceptive Pointwise Convolution",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0386",
"title": "The Residual Bottleneck",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1449",
"title": "Jetson Orin Roofline Memory Bound Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1834",
"title": "Roofline Analysis for MobileNetV2 on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1767",
"title": "Roofline Optimization on Qualcomm Cloud AI 100",
"bloom": "evaluate"
}
],
"rationale": "Focuses on diagnosing and overcoming memory-bandwidth bottlenecks, starting with inefficient convolutions and ending with optimization strategies for memory-bound Vision Transformers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-019-09",
"track": "edge",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-2152",
"title": "INT8 Activation Quantization Range Calibration for Object Detection on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0689",
"title": "The Night Scene Calibration Failure",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2424",
"title": "TensorRT PTQ Calibration on Orin with 200 Scenes",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0645",
"title": "The Foggy Road Catastrophe",
"bloom": "create"
}
],
"rationale": "Focuses on calibration data and distribution shifts in Post-Training Quantization (PTQ), from diagnosing day/night domain shifts to selecting calibrators and fixing catastrophic failures on edge devices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-019-10",
"track": "edge",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-2428",
"title": "Symmetric INT8 Quantization Formula",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1246",
"title": "Asymmetric vs Symmetric Quantization Overhead",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2463",
"title": "Symmetric Zero-Point Quantization on Edge",
"bloom": "understand"
},
{
"level": "L5",
"id": "edge-1766",
"title": "Hailo-8 INT8 Quantization Strategy for Dataflow Streaming",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0880",
"title": "IoT Vibration Dynamic Range",
"bloom": "analyze"
}
],
"rationale": "Explores the mechanics of symmetric versus asymmetric quantization, moving from the core formula to diagnosing overheads, analyzing zero-point handling, and designing architectures for massive dynamic ranges.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-019-11",
"track": "edge",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-2448",
"title": "Orin Parameter Bandwidth",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2137",
"title": "INT4 Weight-Only Quantization for LLM on Hailo-8",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1488",
"title": "Designing a Hybrid Quantization Strategy for Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1671",
"title": "Mixed-Precision LLM Architecture for Cloud AI 100",
"bloom": "create"
}
],
"rationale": "Investigates memory bandwidth and footprint optimizations for LLMs on the edge, progressing from basic FP16/INT8 byte calculations to INT4 weight-only analysis and mixed-precision architectural deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-022-10",
"track": "edge",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-0750",
"title": "The Zero-Touch Provisioning Pipeline",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0756",
"title": "The Edge Model A/B Testing",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0779",
"title": "The Canary in the Coal Mine",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0788",
"title": "Self-Healing Edge AI Fleet",
"bloom": "create"
}
],
"rationale": "Explores fleet management and model rollout strategies, moving from generic provisioning to sophisticated self-healing and canary deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-022-11",
"track": "edge",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-2394",
"title": "Multi-Camera Batch Serving",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-2427",
"title": "Dynamic Batching Inefficiencies in TensorRT",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2391",
"title": "Jetson Multi-Model Triton Serving",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0102",
"title": "The Model Cloning Waste",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2374",
"title": "Triton Dynamic Batching on Edge",
"bloom": "create"
}
],
"rationale": "Covers multi-model serving on edge hardware, progressing from basic batching concepts to advanced Triton dynamic memory allocation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-023-09",
"track": "edge",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0005",
"title": "The Automotive I/O Bottleneck",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-2368",
"title": "Orin 4K Memory Bandwidth",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2413",
"title": "Specifying Hardware Accelerated CV Pipelines on Edge",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2431",
"title": "Jetson Zero-Copy Pipelines",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2478",
"title": "Drone Pre-Processing Bottleneck",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1680",
"title": "Zero-Copy Multi-Camera Pipeline",
"bloom": "create"
}
],
"rationale": "Explores the memory bandwidth constraints and optimization of high-resolution multi-camera pipelines on Jetson Orin, moving from basic limits to zero-copy DMA architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-023-10",
"track": "edge",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2384",
"title": "Hardware Video Decoder Offloading in Edge ML Pipelines",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1415",
"title": "Host Preprocessing Bottleneck on Hailo-8",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0048",
"title": "The YUV Conversion Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1733",
"title": "Optimizing 4K Image Pipelines for Cloud AI 100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2380",
"title": "Zero-Copy DMA Pipeline for Accelerated Edge Video Streams",
"bloom": "create"
}
],
"rationale": "Focuses on host CPU bottlenecks during video decoding and preprocessing before accelerator inference, and how to offload them to hardware VPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-01",
"track": "edge",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-2355",
"title": "Orin Memory Hierarchy Bandwidth Comparison",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-2287",
"title": "Jetson Shared Memory",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2138",
"title": "LPDDR5 vs On-Chip SRAM Trade-offs on Jetson Orin for Real-Time Inference",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1721",
"title": "Optimizing ViT Activation Memory Bandwidth on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2483",
"title": "Vision Transformer Memory Spillage",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2409",
"title": "Industrial Camera LPDDR5 DMA Tax",
"bloom": "create"
}
],
"rationale": "Progresses from basic Orin memory bandwidth facts through shared memory architecture, profiling a specific model's memory pattern, diagnosing a ViT bandwidth stall, evaluating memory spillage, and finally calculating aggregate concurrent bus contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-02",
"track": "edge",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "edge-2292",
"title": "Hailo-8 PCIe Gen 3 Limit",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-2462",
"title": "Layer Fusion for Hailo-8 SRAM Savings",
"bloom": "apply"
},
{
"level": "L3",
"id": "edge-1232",
"title": "Hailo-8 Host Streaming Bandwidth Bottleneck",
"bloom": "apply"
}
],
"rationale": "Explores the constraints of streaming data over PCIe to a local-DRAM-less accelerator like Hailo-8, moving from raw bandwidth limits to footprint reduction and diagnosing a specific streaming bottleneck.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-03",
"track": "edge",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1383",
"title": "Calculate Maximum Batch Size for LLM on Qualcomm Cloud AI 100",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1857",
"title": "Memory Hierarchy Tradeoffs for LLM Deployment on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0675",
"title": "The Edge LLM Memory Wall",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0911",
"title": "On-Device LLM KV Cache Spilling",
"bloom": "analyze"
}
],
"rationale": "Progresses from static memory capacity calculations for LLMs to diagnosing dynamic out-of-memory errors caused by KV cache growth, and finally architecting a spilling mechanism for long-context generation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-04",
"track": "edge",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1053",
"title": "Fault Tolerance: Recall \u2014 What is MTBF and Why Does It Matter for Edge ML?",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1045",
"title": "Fault Tolerance: Fluency \u2014 Checkpoint Size for Edge Online Learning",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1050",
"title": "Fault Tolerance: Optimize Checkpoint Frequency for Battery-Powered Edge Device",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-2444",
"title": "Orin SSD Wear versus Recovery Time",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2489",
"title": "Edge Drone WAL Checkpointing",
"bloom": "evaluate"
}
],
"rationale": "Examines the lifecycle of edge checkpointing for continuous learning, progressing from fundamental interval calculations and storage sizing to optimizing frequency for battery and flash endurance, and finally designing crash-safe write-ahead logging mechanisms.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-05",
"track": "edge",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-0174",
"title": "The Watchdog Timer's Deadline",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0789",
"title": "The Watchdog Blind Spot",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1046",
"title": "Fault Tolerance: Implement Heartbeat and Health Monitor for Edge ML Device",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1054",
"title": "Fault Tolerance: Specification \\u2014 Define Reliability SLA for Safety-Critical Edge Vision",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-1048",
"title": "Fault Tolerance: Mastery \u2014 Fault Tolerant Edge Fleet for Autonomous Industrial Vision",
"bloom": "create"
}
],
"rationale": "Builds from single-device hardware watchdog timing and blind spots to comprehensive fleet-wide health monitoring, SLA specification, and complete fault-tolerance architecture for safety-critical environments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-06",
"track": "edge",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-0821",
"title": "Mitigating Byzantine Poisoning in Federated Learning",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0820",
"title": "Diagnosing Model Poisoning in FedAvg",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0819",
"title": "Evaluating Robust Aggregation in Corrupted Federated Networks",
"bloom": "evaluate"
}
],
"rationale": "Progresses from identifying Byzantine poisoning and its effect on gradient norms to diagnosing the failure of FedAvg under attack, and finally evaluating robust aggregation methods like Krum to mitigate it.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-07",
"track": "edge",
"topic": "systolic-dataflow",
"competency_area": "compute",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1788",
"title": "NVIDIA DLA Primary Compute Architecture on Jetson Orin",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1135",
"title": "Edge Systolic Array Fluency: Roofline Model Mental Math for Jetson",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1291",
"title": "Diagnosing Low Utilization on Jetson Orin DLA",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1128",
"title": "Edge Systolic Array Design: Optimal Tiling for Jetson Orin DLA",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-1138",
"title": "Edge Systolic Array Mastery: Full EfficientNet Roofline on Jetson Orin DLA",
"bloom": "create"
}
],
"rationale": "Guides the learner from the basic architecture of the DLA and systolic arrays to mental roofline models, diagnosing low utilization, calculating exact tile sizes, and finally performing a full network roofline analysis.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-08",
"track": "edge",
"topic": "systolic-dataflow",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1584",
"title": "Edge TPU Systolic Array Throughput Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1191",
"title": "Diagnose Systolic Array Pipeline Stall on Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1281",
"title": "Systolic Array Tiling Strategy for Google Coral Edge TPU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1676",
"title": "Edge TPU Systolic Dataflow for Depthwise Convolutions",
"bloom": "create"
}
],
"rationale": "Explores the constraints of the Coral Edge TPU, progressing from theoretical throughput limits to diagnosing stalls caused by non-contiguous memory access, defining stationary tiling strategies, and redesigning dataflows for challenging depthwise convolutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-024-09",
"track": "edge",
"topic": "systolic-dataflow",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1634",
"title": "Dataflow Latency and Energy on AI 100",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1844",
"title": "Systolic Array Dataflow Optimization for Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1548",
"title": "Dataflow Tradeoffs on Qualcomm AI 100",
"bloom": "evaluate"
}
],
"rationale": "Focuses on mapping large language model operations onto systolic arrays, from basic latency calculations for linear layers to diagnosing self-attention bottlenecks and selecting optimal dataflows for projection matrices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-025-01",
"track": "edge",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-0825",
"title": "Memory Overhead of EWC on Edge",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0826",
"title": "Debugging EWC Memory Overheads on MCU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0827",
"title": "On-Device Continual Learning: EWC vs Replay Buffers",
"bloom": "evaluate"
}
],
"rationale": "Teaches the progressive impact of Elastic Weight Consolidation on SRAM: calculating base memory overhead, diagnosing subsequent OOMs, and evaluating architectural alternatives like replay buffers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-025-02",
"track": "edge",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-0828",
"title": "Edge Experience Replay Buffer Sizing",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0829",
"title": "Diagnosing Forgetting in Edge Anomaly Detection",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0830",
"title": "Latent vs. Raw Experience Replay Trade-offs",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner through sizing an experience replay buffer, diagnosing catastrophic forgetting caused by storage limits, and evaluating advanced latent vs. raw storage tradeoffs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-025-03",
"track": "edge",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-0853",
"title": "Debugging Edge Shadow Mode OOM Crashes",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0927",
"title": "Safe A/B Testing on Storage-Constrained Edge",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0890",
"title": "Shadow Mode on Tight Edge Memory",
"bloom": "analyze"
}
],
"rationale": "Explores the memory constraints of running models in shadow mode, scaling from debugging simple OOM crashes to evaluating safe A/B testing on storage-constrained edge, up to full shadow mode deployment tradeoffs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-025-12",
"track": "edge",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-0153",
"title": "The Kernel Launch Tax",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0154",
"title": "The Fusion Overhead Fallacy",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0413",
"title": "The Kernel Launch Storm",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1058",
"title": "Kernel Fusion: Evaluate Depthwise Convolution Fusion Strategies on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-0598",
"title": "The Fusion Priority Inversion",
"bloom": "evaluate"
}
],
"rationale": "Builds a comprehensive understanding of kernel fusion, starting from basic launch tax concepts, quantifying the overhead, diagnosing kernel storms, evaluating specific DWConv fusion strategies, and discovering priority inversion flaws.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-025-13",
"track": "edge",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1596",
"title": "Dataflow Graph Splitting Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1970",
"title": "Optimizing Memory-Bound Operations on Hailo-8 via Kernel Fusion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1496",
"title": "Hailo-8 Dataflow Optimization and Memory-Bound Operator Fusion",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1974",
"title": "Hailo-8: Optimizing Memory-Bound Operations via Kernel Fusion for Edge Deployment",
"bloom": "analyze"
}
],
"rationale": "Focuses on dataflow accelerators like Hailo-8, progressing from the cost of graph splitting to optimizing memory-bound sequential operators and designing compile-time fusion architectures for energy efficiency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-025-14",
"track": "edge",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1228",
"title": "Analyzing Memory Bottlenecks in Unfused Operations on Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1830",
"title": "Designing Kernel Fusion for ViTs on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1973",
"title": "Optimizing Edge Inference on Jetson Orin: Kernel Fusion for Memory-Bound Operations",
"bloom": "analyze"
}
],
"rationale": "Teaches kernel fusion for complex models on Jetson Orin, analyzing memory bottlenecks of unfused operations, designing fusion for Vision Transformers, and optimizing post-processing pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-026-10",
"track": "edge",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2006",
"title": "MLOps for INT8 Deployment on Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-2008",
"title": "MLOps for Edge Deployment with Google Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1497",
"title": "Edge TPU Fleet CI/CD Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1681",
"title": "CI/CD Pipeline for Fleet-Wide Coral Edge TPU Deployment",
"bloom": "create"
}
],
"rationale": "Follows the specialized CI/CD lifecycle for Google Coral Edge TPUs, starting with INT8 data type constraints, building the FP32-to-INT8 conversion pipeline, architecting fleet updates, and managing massive-scale operator validation rollouts.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-026-11",
"track": "edge",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1075",
"title": "MLOps Lifecycle: Size Storage and Bandwidth for Edge Model Registry",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1071",
"title": "MLOps Lifecycle: Compare OTA Update Strategies for Fleet of Jetson Orin Edge Devices",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-1750",
"title": "Hardware-Aware Shadow Deployment on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1073",
"title": "MLOps Lifecycle: End-to-End MLOps for Autonomous Vehicle Edge Fleet",
"bloom": "create"
}
],
"rationale": "Focuses on deploying models to Jetson Orin fleets, stepping through edge registry storage sizing, comparing full vs delta OTA bandwidth costs, implementing safe hardware-aware shadow deployments, and designing an end-to-end autonomous vehicle MLOps lifecycle.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-026-12",
"track": "edge",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-0752",
"title": "The Gradual Rollout Guru",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0771",
"title": "The Model Versioning Fleet Problem",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0778",
"title": "The Polyglot Fleet",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0893",
"title": "Heterogeneous Edge Model Registry",
"bloom": "analyze"
}
],
"rationale": "Addresses the compounding operational complexity of managing ML deployments across heterogeneous edge hardware, moving from basic gradual rollout design to combinatorial versioning logic, polyglot CI/CD pipelines, and unified registry architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-026-13",
"track": "edge",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2417",
"title": "Calculating Maximum Framerate from Accelerator TOPS Capacity",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1340",
"title": "Hailo-8 Inference Throughput and Power Efficiency Estimation",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2303",
"title": "Hailo-8 Multi-Model Packing",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1463",
"title": "Multi-Model Drone Inspection Design on Hailo-8",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1645",
"title": "Multi-Model Dataflow Compute and PCIe Bandwidth Estimation",
"bloom": "create"
}
],
"rationale": "Builds sequentially from basic TOPS-to-FPS throughput calculations on the Hailo-8, to calculating frames per Joule, packing multiple models onto a single chip, determining system-level cost constraints of concurrent networks, and designing a fused dataflow graph to bypass host bandwidth bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-026-14",
"track": "edge",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2402",
"title": "Orin Thermal Throttle Floor for SLA",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1413",
"title": "Compute Efficiency and Power Limits on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2481",
"title": "Realistic Orin Throughput",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1850",
"title": "Edge Model Evaluation: Performance and Cost on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2370",
"title": "Edge Multi-Camera Sizing",
"bloom": "create"
}
],
"rationale": "Explores Jetson Orin deployment realities, progressing from evaluating SLAs under thermal throttling, diagnosing power limits before compute limits, applying realistic hardware efficiency factors, conducting cost-performance model evaluations, and scaling to bandwidth-constrained multi-camera setups.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-026-15",
"track": "edge",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1369",
"title": "Throughput and Efficiency on Google Coral Edge TPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1852",
"title": "Edge AI Compute Cost and Performance Estimation for Google Coral TPU Deployment",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1848",
"title": "Edge TPU Inference Cost and Scalability",
"bloom": "analyze"
}
],
"rationale": "Teaches how to scale a Google Coral edge deployment from theoretical single-chip throughput and efficiency, to fleet-wide aggregate performance and electrical cost estimates, up to system-level architectural sizing for multi-camera use cases.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-027-16",
"track": "edge",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-0741",
"title": "The OTA Bandwidth Trap: OTA & Firmware Updates",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0409",
"title": "OTA Update Time for Edge Fleet",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0760",
"title": "The OTA Rollback That Bricked the Fleet",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0924",
"title": "Fleet-wide Rollback under Bandwidth Constraints",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-0631",
"title": "The Impossible OTA Update: Architecting a Generative VLM for an Automotive SoC",
"bloom": "create"
}
],
"rationale": "Progresses from calculating OTA download times over cellular to analyzing fleet-wide updates, handling failed rollbacks, deploying bandwidth-constrained observability, and architecting complex VLM updates.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-027-17",
"track": "edge",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0710",
"title": "The A/B Partitioning Storage Tax",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0739",
"title": "The OTA Flash Budget Crunch",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0747",
"title": "The Bricked OTA Update",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0753",
"title": "The OTA Brick Risk",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1485",
"title": "Architecting a Resilient OTA Update System for Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2012",
"title": "Optimizing FOTA for NVIDIA Jetson Orin Fleets with A/B Partitions",
"bloom": "analyze"
}
],
"rationale": "Develops understanding from basic A/B partition size calculation to flash budgeting, mitigating mid-update power loss, managing runtime coupling risks, and architecting full zero-downtime A/B FOTA systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-auto-027-18",
"track": "edge",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1442",
"title": "A/B Partition OTA Power Throttling on AI 100",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1556",
"title": "Evaluating OTA Strategies for Qualcomm Cloud AI 100",
"bloom": "evaluate"
}
],
"rationale": "Examines edge accelerator OTA behavior, moving from diagnosing inference throttling during background flash programming to evaluating complete OTA update strategies under bandwidth constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-bucket-realtime-01",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "edge-0169",
"title": "The Real-Time Batching Tax",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0214",
"title": "The Real-Time Batching Fallacy",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0405",
"title": "The Edge Batch Size Paradox",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0950",
"title": "Edge New 0011",
"bloom": "analyze"
}
],
"rationale": "Progresses from the fundamental concept of batching versus latency to applying and troubleshooting batching in multi-camera edge deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-bucket-realtime-02",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0134",
"title": "Worst-Case vs. Average-Case",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0274",
"title": "The Perception Pipeline's Deadline",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0096",
"title": "The Safety Watchdog Timer",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0072",
"title": "The DVFS Latency Jitter",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0024",
"title": "The WCET Analysis Wall",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0036",
"title": "The WCET Analysis",
"bloom": "create"
}
],
"rationale": "Builds from understanding average versus worst-case latency to constructing safety-critical Worst-Case Execution Time (WCET) arguments for certification.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-bucket-realtime-03",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-0709",
"title": "Thermal Throttling on Edge",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0061",
"title": "The Field Thermal Surprise",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0018",
"title": "The Thermal Throttling Deadline Miss",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0078",
"title": "The Thermal Derating Curve",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0891",
"title": "Thermal Throttling Runtime Adaptation",
"bloom": "evaluate"
}
],
"rationale": "Traces the impact of thermal throttling from basic observation to designing runtime adaptation systems for extreme ambient temperatures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "edge-chain-bucket-realtime-04",
"track": "edge",
"topic": "real-time-deadlines",
"competency_area": "precision",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-0299",
"title": "The Activation Bandwidth Bottleneck",
"bloom": "understand"
},
{
"level": "L4",
"id": "edge-0692",
"title": "Quantization Impact on Detection mAP",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0590",
"title": "The Catastrophic Night-Drive Quantization Failure",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0641",
"title": "The Headlight Blindness Catastrophe",
"bloom": "create"
}
],
"rationale": "Progresses from the hardware benefits of INT8 quantization to diagnosing and structurally resolving severe data-dependent failures like headlight blindness.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-01",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0984",
"title": "Model Format Conversion: Recall ONNX Opset Compatibility for Mobile Deployment",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-2147",
"title": "Model Format Conversion: Sizing the FP16 CoreML Payload",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1022",
"title": "Model Format Conversion: Mobile Fluency \u2014 CoreML Model Conversion Pipeline in 60 Seconds",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1511",
"title": "CoreML Conversion and ANE Delegation Strategy",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0980",
"title": "Model Format Conversion: Full Stack LLM Conversion for On-Device iOS Deployment",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1662",
"title": "Optimizing LLM Deployment on Apple A17 Pro: CoreML Conversion & Operator Gaps",
"bloom": "analyze"
}
],
"rationale": "Guides engineers through the end-to-end iOS CoreML deployment lifecycle, starting from opset matching and FP16 sizing, moving to pipeline configuration, and culminating in advanced fallback mitigations for LLMs on the Apple Neural Engine.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-02",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1104",
"title": "CoreML Fallback Penalty on Apple A17 Pro",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1418",
"title": "CoreML Fallback Memory Transfer Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1657",
"title": "A17 Pro LLM Deployment with CoreML Operator Gaps",
"bloom": "analyze"
}
],
"rationale": "Focuses exclusively on diagnosing and resolving CoreML CPU fallback bottlenecks, progressing from calculating the latency penalty to formulating a comprehensive architectural strategy for handling operator gaps on the A17 Pro.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-03",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1324",
"title": "Hexagon NPU Delegation Fallback Latency",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0982",
"title": "Model Format Conversion: Optimize TFLite Model Conversion for Snapdragon DSP",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1139",
"title": "Designing Graph Delegation for Snapdragon 8 Gen 3",
"bloom": "evaluate"
}
],
"rationale": "Explores TFLite execution on Android NPUs, teaching how to calculate the cost of CPU fallbacks, profile unoptimized models on the Snapdragon DSP, and design a custom graph delegation strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-04",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-1166",
"title": "Diagnosing Delegation Fallback on Exynos 2400",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1451",
"title": "Exynos 2400 NPU Delegation and Operator Fallback",
"bloom": "evaluate"
}
],
"rationale": "A focused progression on optimizing models for the Samsung Exynos NPU, transitioning from diagnosing LPDDR5X bandwidth contention to rewriting the graph to eliminate fallbacks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-05",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "mobile-0006",
"title": "The OTA Bandwidth Bottleneck",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0245",
"title": "The OTA Storage Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1024",
"title": "Model Format Conversion: Realize Multi-Format Model Storage for Cross-Platform iOS+Android",
"bloom": "apply"
}
],
"rationale": "Investigates the storage and network constraints of shipping model updates, advancing from basic OTA limitations to cross-platform CDN optimization via delta formats.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-06",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L1",
"L2"
],
"questions": [
{
"level": "L1",
"id": "mobile-0006",
"title": "The OTA Bandwidth Bottleneck",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0362",
"title": "The OTA Double-Storage Tax",
"bloom": "understand"
}
],
"rationale": "Explores the storage constraints of OTA updates, diving into the specific double-storage tax required to enable instant rollback when delivering a quantized model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-07",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-0977",
"title": "Model Format Conversion: Compare TFLite vs. CoreML for Cross-Platform Mobile Deployment",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-0985",
"title": "Model Format Conversion: Specify Multi-Platform Model Conversion CI/CD Requirements",
"bloom": "create"
},
{
"level": "L6+",
"id": "mobile-0981",
"title": "Model Format Conversion: End-to-End LLM CoreML Stateful Deployment Mastery",
"bloom": "create"
}
],
"rationale": "Explores cross-platform deployment strategies, advancing from comparing model format targets to orchestrating automated CI/CD pipelines and designing holistic multi-platform architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-08",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-1658",
"title": "Diagnosing TFLite Performance Regressions on Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1291",
"title": "LLM TPU Delegation vs CPU Fallback on Tensor G3",
"bloom": "evaluate"
}
],
"rationale": "Focuses on deployment for Google Tensor G3, moving from diagnosing TFLite latency regressions to making strategic architectural decisions between TPU delegation and CPU fallback for LLMs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-09",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-1270",
"title": "NPU Memory Bandwidth Contention",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1659",
"title": "Exynos 2400 NPU Deployment for a Transformer-based Object Detector",
"bloom": "analyze"
}
],
"rationale": "Addresses performance tuning on Exynos NPUs, progressing from diagnosing bandwidth contention with the ISP to optimizing a transformer-based object detector for real-time inference.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-10",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L1",
"L2"
],
"questions": [
{
"level": "L1",
"id": "mobile-0001",
"title": "The OTA Cellular Limit",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0375",
"title": "The Mobile OTA Update Budget",
"bloom": "understand"
}
],
"rationale": "Explores strict mobile download limits and calculates the specific peak storage budget required for fail-safe binary patching of models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-11",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L2",
"L3"
],
"questions": [
{
"level": "L2",
"id": "mobile-0983",
"title": "Model Format Conversion: Size ONNX vs. TFLite Model Storage for Mobile App",
"bloom": "apply"
},
{
"level": "L3",
"id": "mobile-0979",
"title": "Model Format Conversion: Implement ONNX\u2192CoreML Conversion with Numerical Validation",
"bloom": "apply"
}
],
"rationale": "Covers the transition from ONNX models, starting with sizing the storage footprint for the app bundle and moving to the practical implementation of ONNX-to-CoreML conversion.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-001-12",
"track": "mobile",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-1660",
"title": "Optimizing Model Conversion for Apple A17 Pro Neural Engine",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-0978",
"title": "Model Format Conversion: Compare INT8 vs. INT4 CoreML Quantization on A17 Pro",
"bloom": "evaluate"
}
],
"rationale": "Evaluates Apple A17 Pro deployment constraints, advancing from optimizing the general PyTorch-to-CoreML conversion to specifically comparing the accuracy and performance tradeoffs of INT8 versus INT4 quantization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-012-01",
"track": "mobile",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0310",
"title": "The Mobile Jank Budget (mobile-0310)",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0330",
"title": "The Mobile TOPS Illusion",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0461",
"title": "The AR Filter Frame Drop",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0675",
"title": "The Memory Bandwidth Throttling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0616",
"title": "The 60 FPS Camera ML Pipeline",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0098",
"title": "The Async Camera Pipeline",
"bloom": "create"
}
],
"rationale": "Progresses from basic frame latency budgets to diagnosing AR compute bottlenecks, and finally architecting an asynchronous camera pipeline to decouple slow segmentation from 60 FPS preview.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-012-02",
"track": "mobile",
"topic": "real-time-deadlines",
"competency_area": "reliability",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0755",
"title": "The Battery Inference Budget",
"bloom": "understand"
},
{
"level": "L2",
"id": "mobile-0161",
"title": "The Jank Instigator",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0101",
"title": "The Sustained vs Burst Reality",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0744",
"title": "The Burst Benchmarking Illusion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0120",
"title": "The Thermal Throttling Prediction",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0096",
"title": "The Adaptive Power Maestro",
"bloom": "create"
}
],
"rationale": "Explores the impact of thermal throttling on mobile performance, starting with battery capacity limits and advancing to burst benchmarking illusions and dynamic SoC power adaptation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-012-03",
"track": "mobile",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0284",
"title": "The Mobile UI Jank Budget",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0220",
"title": "The 16ms UI Jank Deadline",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0504",
"title": "The Sluggish Smart Reply: Real-Time Deadlines",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-0633",
"title": "The Voice Assistant That Froze The Speedometer",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0658",
"title": "The AutoScribe Jank Crisis",
"bloom": "create"
}
],
"rationale": "Analyzes the latency and scheduling challenges of on-device LLMs, from basic UI jank deadlines to diagnosing batching overhead and managing concurrent real-time inference tasks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-012-04",
"track": "mobile",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-0326",
"title": "The UI Jank Deadline",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0059",
"title": "The 60 FPS Jank Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0543",
"title": "The Real-Time Filter Jank",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0595",
"title": "The CPU-GPU Asynchronous Desync",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0609",
"title": "The Shared GPU Contention",
"bloom": "evaluate"
}
],
"rationale": "Diagnoses real-time video filter bottlenecks, moving from simple frame budgets to roofline analysis and resolving complex CPU-GPU asynchronous desync and contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-013-05",
"track": "mobile",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "mobile-0202",
"title": "The Hidden Cost of Privacy",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0203",
"title": "The Economics of Privacy: Centralized vs. Federated",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0506",
"title": "The Privacy vs. Price Dilemma",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0795",
"title": "Federated Learning Carbon Trade-offs",
"bloom": "evaluate"
}
],
"rationale": "Builds an economic and environmental Total Cost of Ownership (TCO) comparison between centralized collection and federated learning.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-013-06",
"track": "mobile",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-0335",
"title": "The Battery Tax of Federated Learning",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0307",
"title": "The On-Device Battery Tax",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0408",
"title": "The Battery Drain Anomaly",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1742",
"title": "Federated Learning Optimization for Cross-Device Personalization on Edge NPUs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1740",
"title": "Federated LLM Personalization on Snapdragon 8 Gen 3 Hexagon NPU",
"bloom": "analyze"
}
],
"rationale": "Examines the severe battery and thermal costs of federated learning, advancing to architecting full system optimizations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-014-15",
"track": "mobile",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "mobile-0234",
"title": "The 25% Mobile Memory Rule",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-2096",
"title": "Applying iOS Jetsam Footprint Limits to LLMs",
"bloom": "apply"
},
{
"level": "L3",
"id": "mobile-0666",
"title": "The App Memory Pressure Levels",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0743",
"title": "The Jetsam Guillotine",
"bloom": "analyze"
}
],
"rationale": "Walks through mobile OS memory limits from the foundational 25% rule, to how large single allocations trigger jetsam, to observing jetsam under app switching, and finally diagnosing invisible resource consumers causing unexpected termination.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-014-16",
"track": "mobile",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-1936",
"title": "Name tightly coupled memory used for local NPU accelerator caching",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-2100",
"title": "4K Image Memory Tiling",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1910",
"title": "Mobile NPU SRAM Spilling Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1978",
"title": "Hexagon NPU TCM Spilling Impact",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1986",
"title": "NPU TCM Memory Tiling",
"bloom": "create"
}
],
"rationale": "Progresses from the definition of NPU Tightly Coupled Memory (TCM), to calculating tile sizes, diagnosing the latency bottleneck when intermediate activations spill from TCM to DRAM, quantifying that impact, and finally designing a comprehensive tiling strategy to prevent spilling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-014-17",
"track": "mobile",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-0682",
"title": "The Memory-Mapped Page Fault",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0686",
"title": "The Memory-Mapped Weight Strategy",
"bloom": "evaluate"
}
],
"rationale": "Diagnoses the specific UI-freezing page fault behavior caused by mmap on first inference, and progresses to designing an optimized memory-mapped weight strategy that maintains fast startup without blocking.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-014-18",
"track": "mobile",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-0319",
"title": "The On-Device Reader's Bottleneck",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-0677",
"title": "The LPDDR5X Bandwidth Budget",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0684",
"title": "The Mobile Memory Controller Puzzle",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-2091",
"title": "Calculating LLM Token Rates by Memory Bandwidth",
"bloom": "apply"
}
],
"rationale": "Explores the memory bandwidth bottleneck in on-device LLMs, starting from basic theoretical token generation rates, moving to LPDDR5X budget constraints, diagnosing real-world controller inefficiencies, and modeling complex system-level bandwidth contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-014-19",
"track": "mobile",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-2079",
"title": "LLM KV Cache Impact on Unified Memory System Cache",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1783",
"title": "Unified Memory Architecture Impact on LLM Decode on A17 Pro",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0693",
"title": "The DRAM Bandwidth Contention",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1387",
"title": "LLM Memory Co-Design on Exynos 2400 NPU",
"bloom": "create"
}
],
"rationale": "Examines the shared Unified Memory Architecture (UMA) where the NPU, GPU, and ISP fight for the same bandwidth, moving from basic cache impact to quantifying UMA contention, diagnosing UI stutters, and finally co-designing memory across the entire SoC.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-014-20",
"track": "mobile",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-0664",
"title": "The Mobile LLM KV-Cache Squeeze",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1790",
"title": "OOM Diagnosis for LLM Context Extension on A17 Pro",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-0696",
"title": "The On-Device LLM Memory Architecture",
"bloom": "create"
}
],
"rationale": "Explores the specific challenge of the expanding KV cache on mobile devices, from observing delayed jetsam in long chats, to diagnosing context-extension OOMs, and architecting a memory management system to fit large models in constrained RAM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-017-04",
"track": "mobile",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-0446",
"title": "The Night-Vision Accuracy Collapse",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0709",
"title": "The Int8 Quantization Activation Clipping",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-2098",
"title": "Transformer INT8 Outliers",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-0836",
"title": "Mixed-Precision Mobile Super-Resolution",
"bloom": "create"
}
],
"rationale": "Examines how activation outliers in edge scenarios cause catastrophic accuracy drops after INT8 quantization and explores mitigation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-017-05",
"track": "mobile",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-0703",
"title": "The Cross-SoC Accuracy Divergence",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0713",
"title": "The Cross-SoC Quantization Divergence",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0716",
"title": "The Cross-Platform Confidence Score Divergence",
"bloom": "create"
}
],
"rationale": "Demonstrates how fragmented hardware quantization implementations lead to diverging model behaviors and scores across mobile platforms.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-017-06",
"track": "mobile",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-0249",
"title": "The On-Device Memory Diet: Quantization Fundamentals",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-1932",
"title": "Calculate model memory reduction accounting for static KV cache",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0711",
"title": "The INT4 Weight-Only Quantization",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1919",
"title": "A17 Pro LLM Quantization Protocol",
"bloom": "create"
}
],
"rationale": "Traces the progression of optimizing Large Language Models on mobile devices by calculating memory savings from INT8 to INT4 and addressing KV cache impacts.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-018-04",
"track": "mobile",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1060",
"title": "Mobile Transformer Cost Recall: Mobile LLM Memory Bandwidth Facts",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1053",
"title": "Mobile Transformer Cost Fluency: Quick LLM Sizing for Mobile",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1052",
"title": "Mobile Transformer Cost Evaluation: Snapdragon vs A17 Pro LLM Performance",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1057",
"title": "Mobile Transformer Cost Optimization: Quantization + Speculative for Mobile LLM",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1056",
"title": "Mobile Transformer Cost Mastery: Mobile LLM Scaling Law Analysis",
"bloom": "evaluate"
}
],
"rationale": "Develops the learner's ability to model and optimize mobile LLM latency, starting with memory bandwidth fundamentals and ending with Pareto-optimal scaling law analysis.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-019-03",
"track": "mobile",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-0208",
"title": "The Two Latencies of Generative AI",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0260",
"title": "The On-Device Assistant's First Word",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0550",
"title": "The Sluggish Voice Assistant",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1554",
"title": "Analyzing On-Device LLM Latency on Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1184",
"title": "A17 Pro Neural Engine Latency Breakdown",
"bloom": "evaluate"
}
],
"rationale": "Explores Time-To-First-Token (TTFT) and Time-Per-Output-Token (TPOT), from basic definitions and memory bandwidth calculations to diagnosing bottlenecks and performing full architectural latency decompositions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-019-04",
"track": "mobile",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-0780",
"title": "TFLite Delegate Subgraph Partitioning Overhead",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0779",
"title": "Diagnosing TFLite NNAPI Delegate Subgraph Fallback",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1556",
"title": "Latency Decomposition for Mobile ML on Snapdragon NPU",
"bloom": "analyze"
}
],
"rationale": "Progresses from calculating subgraph fallback overhead to diagnosing unexpected fallback slowdowns on mobile NPUs, ending with architectural evaluation to prevent pre/post-processing bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-019-05",
"track": "mobile",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-0072",
"title": "The Heterogeneous Scheduling Dilemma",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0611",
"title": "The Inference Timing Jitter",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0097",
"title": "The OS Scheduler's Dilemma",
"bloom": "create"
}
],
"rationale": "Explores the impact of heterogeneous OS scheduling on inference, moving from diagnosing multi-model contention to minimizing jitter, and finally architecting QoS guarantees at the scheduler level.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-022-08",
"track": "mobile",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1020",
"title": "Network Bandwidth Bottlenecks: Recall LPDDR5X Bandwidth and Mobile Memory Hierarchy",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-1027",
"title": "Network Bandwidth Bottlenecks: Fluency \u2014 LPDDR5X BW and TPOT Math in 30 Seconds",
"bloom": "apply"
},
{
"level": "L3",
"id": "mobile-1029",
"title": "Network Bandwidth Bottlenecks: Implement Memory-Bandwidth-Bound TPOT Derivation",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1013",
"title": "Network Bandwidth Bottlenecks: Optimize LLM TPOT via Weight Quantization on A17 Pro",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1012",
"title": "Network Bandwidth Bottlenecks: Master Roofline Analysis for Mobile LLM Scaling",
"bloom": "evaluate"
}
],
"rationale": "Builds foundational fluency in mobile memory bandwidth and TPOT math, progressing to roofline analysis and quantization strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-022-09",
"track": "mobile",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1000",
"title": "Network Bandwidth Bottlenecks: Analyze LPDDR5X Shared Bus for Mobile SoC Memory Bottleneck",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1004",
"title": "Network Bandwidth Bottlenecks: Diagnose LPDDR5X Bandwidth Saturation on Snapdragon",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1019",
"title": "Network Bandwidth Bottlenecks: Specify QoS Framework for Multi-Model Mobile Inference",
"bloom": "create"
},
{
"level": "L6+",
"id": "mobile-1011",
"title": "Network Bandwidth Bottlenecks: Master Full Bandwidth Analysis for Mobile AI System",
"bloom": "evaluate"
}
],
"rationale": "Examines the impact of concurrent workloads on shared memory buses, culminating in the design of comprehensive QoS frameworks for multi-model systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-023-07",
"track": "mobile",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-2035",
"title": "Understanding Zero-Copy Unified Memory on Apple Silicon",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1084",
"title": "A17 Pro Unified Memory Contention in Video Pipeline",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1497",
"title": "On-Device Real-Time Video Pipeline Design for A17 Pro",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1706",
"title": "On-Device Real-time Sensor Fusion Pipeline for Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1914",
"title": "Zero-Copy Video Pipeline on Apple Silicon",
"bloom": "create"
}
],
"rationale": "Progresses from the fundamentals of zero-copy memory on Apple Silicon to designing and optimizing complex real-time video pipelines for the A17 Pro.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-023-08",
"track": "mobile",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1979",
"title": "A17 Pro ISP to NPU Pipeline",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1179",
"title": "On-Device Vision Pipeline Memory Bandwidth Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1946",
"title": "Zero-Copy Video Pipeline",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-2092",
"title": "Applying Accelerators for Zero-Copy Image Resizing",
"bloom": "apply"
},
{
"level": "L6+",
"id": "mobile-1989",
"title": "Zero-Copy Image Pipeline",
"bloom": "create"
}
],
"rationale": "Covers the end-to-end memory bandwidth and throughput bottlenecks of feeding an NPU from an ISP, culminating in designing a unified zero-copy architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-026-05",
"track": "mobile",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-0914",
"title": "Fault Tolerance: Fluency \u2014 Mobile Checkpoint Write Speed and Overhead",
"bloom": "apply"
},
{
"level": "L3",
"id": "mobile-0910",
"title": "Fault Tolerance: Analyze Checkpoint Overhead for On-Device Fine-Tuning on A17 Pro",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-0920",
"title": "Fault Tolerance: Optimize Checkpoint Storage Budget for Low-Storage Mobile Devices",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1964",
"title": "On-Device Checkpoint RPO/RTO",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1891",
"title": "A17 OS Termination Grace",
"bloom": "create"
}
],
"rationale": "Explores the end-to-end design of on-device LoRA checkpointing on mobile NPUs, transitioning from baseline write latency to failure probability impacts, storage budget optimization, RTO/RPO SLAs, and finally strict OS termination grace period budgeting.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-026-06",
"track": "mobile",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-2076",
"title": "Optimal Checkpointing Frequency for Mobile Federated Learning",
"bloom": "apply"
},
{
"level": "L2",
"id": "mobile-1943",
"title": "Federated Learning Flash Memory Wear",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-2109",
"title": "Federated Learning OOM Recovery",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-2069",
"title": "Federated Checkpoint Thermal Impact",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-2007",
"title": "Mobile FL Checkpoint Wear",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1923",
"title": "Android OOM Gradient Resumption",
"bloom": "create"
}
],
"rationale": "Progresses through the challenges of continuous mobile federated learning, covering baseline dropout resilience, flash wear-out risks, OOM recovery latency tradeoffs, thermal throttling consequences, and architecting an optimal fine-grained checkpointing system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-09",
"track": "mobile",
"topic": "model-size-estimation",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-0997",
"title": "Model Size Estimation: Realize Full Memory Layout for 3B LLM on Snapdragon 8 Gen 3",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1611",
"title": "Diagnosing Large Language Model Deployment on Snapdragon 8 Gen 3 Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1246",
"title": "Architecting a Vision-Language Model on Snapdragon 8 Gen 3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1391",
"title": "On-Device Multimodal Architecture for Snapdragon 8 Gen 3",
"bloom": "create"
}
],
"rationale": "Builds from sizing an LLM on Snapdragon, to diagnosing OOMs on Hexagon, architecting VLM constraints, and ultimately designing a comprehensive multimodal architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-10",
"track": "mobile",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1105",
"title": "Analyzing OOM for 3B FP16 Model on A17 Pro",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1512",
"title": "On-Device LLM Sizing for Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1613",
"title": "Evaluating Transformer Architectures for On-Device Deployment on Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-0993",
"title": "Model Size Estimation: Master Full On-Device LLM Memory Audit for Production iOS App",
"bloom": "evaluate"
}
],
"rationale": "Progresses from analyzing basic unified memory OOMs, to sizing an LLM within A17 Pro constraints, evaluating models, and performing a rigorous production memory audit.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-11",
"track": "mobile",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-1167",
"title": "Diagnosing OOM on Exynos 2400 Shared Memory",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1452",
"title": "Sizing an LLM for Exynos 2400 NPU Deployment",
"bloom": "evaluate"
}
],
"rationale": "Moves from diagnosing OOMs caused by the Exynos 2400 shared memory pool to properly sizing and quantizing an LLM deployment for it.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-12",
"track": "mobile",
"topic": "model-size-estimation",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-0991",
"title": "Model Size Estimation: Fluency \u2014 Size Mobile LLM Memory in 60 Seconds",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0995",
"title": "Model Size Estimation: Diagnose Mobile OOM from KV-Cache Growth",
"bloom": "evaluate"
}
],
"rationale": "Progresses from a quick static memory calculation to diagnosing dynamic memory growth crashes due to multi-turn KV cache accumulation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-13",
"track": "mobile",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1310",
"title": "On-Device Active Learning Compute for Curation",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0896",
"title": "Dataset Curation: Design On-Device Data Collection for Mobile Model Training",
"bloom": "create"
},
{
"level": "L5",
"id": "mobile-1230",
"title": "On-device Data Curation for Active Learning",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1376",
"title": "On-Device LLM Active Learning Curation on Tensor G3",
"bloom": "create"
}
],
"rationale": "Progresses from calculating compute limits for active learning curation to designing the pipeline, specializing for privacy-preserving smart replies, and architecting LLM personalization on Tensor G3.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-14",
"track": "mobile",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1190",
"title": "INT8 Calibration Dataset Bias",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1499",
"title": "Data Curation for INT8 Hexagon NPU Calibration",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1720",
"title": "On-Device Object Detection Dataset Curation for Snapdragon 8 Gen 3 NPU",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1725",
"title": "On-Device Dataset Curation for Bias Mitigation on Snapdragon 8 Gen 3",
"bloom": "analyze"
}
],
"rationale": "Takes the learner from understanding INT8 calibration bias to curating specific datasets for NPU calibration, actively learning on-device, and mitigating complex biases in real-time edge detection.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-auto-027-15",
"track": "mobile",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-1719",
"title": "Optimizing On-Device Image Classification Dataset with Active Learning on Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1723",
"title": "On-device Active Learning for Gesture Recognition Dataset Curation on Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Focuses on the Apple A17 Pro ecosystem, advancing from designing an image classification active learning pipeline to a privacy-preserving real-time gesture recognition curation system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-kvcachem-01",
"track": "mobile",
"topic": "kv-cache-management",
"competency_area": "architecture",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-1985",
"title": "A17 Pro Unified Memory Limit",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-1540",
"title": "Apple A17 Pro KV-Cache Sizing and Memory Pressure for LLM Inference",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1319",
"title": "KV-Cache Sizing for A17 Pro",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1072",
"title": "On-Device KV-Cache Budget for Mobile LLM",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1240",
"title": "A17 Pro Unified Memory KV-Cache Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1880",
"title": "A17 Pro 3B Model Sequence Limit",
"bloom": "create"
}
],
"rationale": "Progresses from foundational KV cache memory estimation to calculating exact constraints for A17 Pro, designing eviction policies within strict budgets, and deriving absolute theoretical sequence limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-kvcachem-02",
"track": "mobile",
"topic": "kv-cache-management",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-2152",
"title": "Android SoC NPU KV Cache Size Estimation",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1097",
"title": "Exynos 2400 Shared Memory Exhaustion",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1185",
"title": "Designing KV-Cache for Exynos 2400 NPU Shared Memory",
"bloom": "analyze"
}
],
"rationale": "Focuses on Exynos 2400 shared memory constraints, moving from baseline capacity estimation to diagnosing memory exhaustion cliffs, and finally designing an allocation strategy to prevent out-of-memory kills.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-kvcachem-03",
"track": "mobile",
"topic": "kv-cache-management",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1545",
"title": "KV-Cache Pressure on Google Tensor G3 for Long Contexts",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1162",
"title": "Diagnosing OOM during long-context Gemini Nano inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1542",
"title": "On-Device LLM KV-Cache Optimization for Google Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Explores handling long contexts on the Tensor G3, progressing from observing latency and OOM symptoms to diagnosing specific inference crashes, and architecting a comprehensive memory mitigation strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-kvcachem-04",
"track": "mobile",
"topic": "kv-cache-management",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-0950",
"title": "KV-Cache: Fluency \u2014 Decode Throughput vs Context Length Trade-off on A17 Pro",
"bloom": "apply"
},
{
"level": "L3",
"id": "mobile-0940",
"title": "KV-Cache: Analyze KV Cache Memory Pressure on A17 Pro",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-0946",
"title": "KV-Cache: Evaluate KV Cache Quantization Strategies on A17 Pro",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-0955",
"title": "KV-Cache: Optimize KV Cache for Aggressive Memory Compression on A17 Pro",
"bloom": "evaluate"
}
],
"rationale": "Progresses from observing throughput degradation due to context length, to analyzing the underlying memory pressure, evaluating specific quantization formats, and designing an aggressive compression strategy to maximize context.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-kvcachem-05",
"track": "mobile",
"topic": "kv-cache-management",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-2119",
"title": "PagedAttention Fragmentation on Mobile",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1288",
"title": "Evaluating PagedAttention vs Static Allocation for On-Device LLMs",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1181",
"title": "Dynamic KV-Cache Paging for A17 Pro Unified Memory",
"bloom": "create"
}
],
"rationale": "Examines the implementation of PagedAttention on mobile architectures, moving from calculating fragmentation overhead to evaluating trade-offs against static allocation, and finally architecting a dynamic paging engine.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-kvcachem-06",
"track": "mobile",
"topic": "kv-cache-management",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-0951",
"title": "KV-Cache: Fluency \u2014 GQA Memory Savings Calculation for Mobile LLM",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0948",
"title": "KV-Cache: Evaluate GQA vs MHA for Memory-Constrained Mobile LLM",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating the absolute memory savings of grouped-query attention to holistically evaluating its architectural trade-offs against multi-head attention for memory-constrained devices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-modelser-01",
"track": "mobile",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0146",
"title": "The Cellular Download Budget",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0008",
"title": "The OTA Budget Constraint",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0069",
"title": "The App Store Binary Size Limit",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0078",
"title": "The Model Update Delta Compression",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-0038",
"title": "The Delivery Paradox",
"bloom": "create"
}
],
"rationale": "Guides the learner through mobile model delivery constraints, starting from basic size limits to calculating budgets, circumventing limits with quantization, applying delta compression, and designing a full custom delivery pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-modelser-02",
"track": "mobile",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0244",
"title": "The 7 Billion Parameter Car Crash",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0305",
"title": "The OTA Budget Bust",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-2154",
"title": "The Infotainment Traffic Jam",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-0654",
"title": "The Automotive Co-Pilot Conundrum",
"bloom": "create"
}
],
"rationale": "Follows the system engineering lifecycle of deploying a massive 7B parameter LLM to an automotive infotainment system, from basic parameter sizing to solving real-time hardware contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-modelser-03",
"track": "mobile",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-2078",
"title": "Pipelining Model Weights Loading in Mobile Voice Translation",
"bloom": "analyze"
},
{
"level": "L2",
"id": "mobile-2033",
"title": "Understanding Android NNAPI Initialization Overheads",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0741",
"title": "The Launch Blocker",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0084",
"title": "The CoreML Model Compilation Jitter",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-2036",
"title": "Evaluating JIT Compilation Latency Spikes on Mobile NPU",
"bloom": "evaluate"
}
],
"rationale": "Explores the cold-start and initialization latency of mobile ML models, progressing from pipelining weights and main-thread blocking bugs to mitigating JIT compilation UI freezes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-modelser-04",
"track": "mobile",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1939",
"title": "CoreML Execution Targets on Apple Silicon",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-0015",
"title": "The App Store ML Review Trap",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0085",
"title": "The Android NNAPI Driver Fallback",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0092",
"title": "The 1000-Device Android Fragmentation Problem",
"bloom": "evaluate"
}
],
"rationale": "Teaches the challenges of hardware fragmentation and cross-platform runtimes, moving from theoretical hardware targets to debugging device-specific fallback paths and handling fragmentation at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-powerbud-01",
"track": "mobile",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-0151",
"title": "The Mobile Thermal Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0103",
"title": "The Throttling Treadmill",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0581",
"title": "The Pocket Oven LLM",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-2161",
"title": "Race-to-Sleep vs. Paced Execution for Mobile LLMs",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1587",
"title": "Optimizing Quantized LLM Inference on Snapdragon 8 Gen 3 Hexagon NPU for Power Efficiency",
"bloom": "analyze"
}
],
"rationale": "Progresses from basic thermal budget math to observing throttling, then to advanced LLM-specific thermal modeling, power management strategies, and full system optimization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-powerbud-02",
"track": "mobile",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-0197",
"title": "The Background Battery Drainer",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0104",
"title": "The Silent Battery Drain",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0112",
"title": "The Background ML Battery Drain",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0119",
"title": "The Background Inference Power Budget",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating background duty cycles to identifying common pitfalls, diagnosing overnight battery drain, and scheduling inference within strict OS background constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-powerbud-03",
"track": "mobile",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-0255",
"title": "The Mobile Power Chasm",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1210",
"title": "Always-On Inference Power Discrepancy",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0115",
"title": "The Background ML Battery Vampire",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0116",
"title": "The Power Domain Juggling Act",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0656",
"title": "The Sentry Mode Thermal Budget",
"bloom": "create"
}
],
"rationale": "Explores the hidden system-level power costs of always-on ML, progressing from unexplained power jumps to domain wake-up taxes, and finally architecting a multi-day always-on sentry system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-powerbud-04",
"track": "mobile",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-0198",
"title": "The AR Thermal Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1108",
"title": "Thermal Throttling on A17 Pro Neural Engine",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1186",
"title": "Exynos 2400 NPU Real-Time Power Budgeting",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1455",
"title": "Evaluate A17 Pro Power Budgeting for Real-Time Video AI",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1395",
"title": "Always-On Video NPU Power Optimization",
"bloom": "create"
}
],
"rationale": "Guides the learner through managing thermal budgets for real-time video, from basic AR duty cycles and diagnosing throttling to specifying power caps and architecting paced NPU execution.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-powerbud-05",
"track": "mobile",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-0172",
"title": "The Duty Cycle Power Trap",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-0105",
"title": "The Battery Drain Dilemma",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1797",
"title": "Energy-Delay Product Optimization for Inference on A17 Pro",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1586",
"title": "Apple A17 Pro ML Inference Power Optimization",
"bloom": "analyze"
}
],
"rationale": "Teaches the crucial distinction between peak power and total energy, progressing from basic duty cycles to qualitative CPU vs GPU trade-offs, calculating Energy-Delay Product, and optimizing DVFS for maximum efficiency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-powerbud-06",
"track": "mobile",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-0102",
"title": "The Cellular Modem Power Surprise",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-0749",
"title": "The Radio Energy Wall",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the disproportionate power cost of network transmission for ML features, moving from recognizing the modem power spike to diagnosing unoptimized radio usage in a system context.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-roofline-01",
"track": "mobile",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "mobile-0231",
"title": "NPU vs. Reality",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0384",
"title": "The Mobile NPU Ridge Point",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1524",
"title": "Roofline Analysis on Apple A17 Pro: Identifying Bottlenecks",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1521",
"title": "Roofline Analysis for Mobile AI: Optimizing Real-time Object Detection on Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Progresses from theoretical peak compute to calculating the ridge point, diagnosing utilization, and finally applying roofline analysis to real-world resolution scaling issues on the Apple A17 Pro.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-roofline-02",
"track": "mobile",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-0364",
"title": "The MobileNet Bottleneck",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0537",
"title": "The Mobile Video Battery Drain",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-2157",
"title": "The Depthwise Cache Collapse",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0617",
"title": "The Depthwise Memory Bound",
"bloom": "evaluate"
}
],
"rationale": "Explores the memory-bound nature of depthwise convolutions, starting from basic arithmetic intensity calculation, moving to power implications, structural cache collapse, and finally real-world performance scaling paradoxes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-roofline-03",
"track": "mobile",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-0136",
"title": "The NPU Energy Advantage",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0157",
"title": "The 85% Delegation Fallacy",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-0580",
"title": "The ANE Delegation Regression",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0607",
"title": "The Budget Phone Mystery",
"bloom": "evaluate"
}
],
"rationale": "Teaches the critical impact of CPU fallback and heterogeneous execution, moving from basic NPU efficiency to Amdahl's Law in delegation, and finally debugging system-level regressions and hardware disparities.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "mobile-chain-bucket-roofline-04",
"track": "mobile",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1362",
"title": "Calculate Gemini Nano Inference Bound on Tensor G3",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1460",
"title": "On-Device LLM Roofline Analysis on Tensor G3",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the severe memory bandwidth bottlenecks of autoregressive LLM decoding on mobile NPUs, moving from inference bounds to full roofline evaluations of INT4 quantization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-01",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0138",
"title": "The Flash vs. SRAM Divide",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0149",
"title": "The Flash vs. SRAM Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0698",
"title": "The Flash-SRAM Boundary",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0709",
"title": "The Tensor Arena Overflow",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0720",
"title": "The Peak RAM Puzzle",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0663",
"title": "The Conversational Doorbell's Memory Deficit",
"bloom": "create"
}
],
"rationale": "Progresses from identifying basic memory locations for weights and activations, to budgeting them, understanding the architectural necessity of SRAM, and finally solving increasingly severe SRAM bottlenecks for activations in deep networks and transformers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-02",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0266",
"title": "The SRAM vs. Flash Fallacy",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0388",
"title": "The TinyML Tensor Arena Trap",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0699",
"title": "The Stack vs Heap on MCU",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0707",
"title": "The TFLite Micro Heap Overhead",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0729",
"title": "The SRAM Fragmentation Crash",
"bloom": "evaluate"
}
],
"rationale": "Explores the complexities of memory allocation for activations, progressing from understanding the working memory budget to sizing the static tensor arena, and finally debugging heap overhead and fragmentation crashes in TFLite Micro.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-03",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0169",
"title": "The DMA Dividend",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0697",
"title": "The Memory-Mapped Sensor Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0710",
"title": "The Double Buffering DMA Strategy",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0730",
"title": "The SPI DMA Cache Coherency Failure",
"bloom": "evaluate"
}
],
"rationale": "Explores the evolution of sensor data ingestion, from the CPU costs of PIO and polling, to implementing DMA double-buffering, and finally debugging complex DMA cache coherency issues.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-04",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "precision",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0250",
"title": "The TinyML Memory Diet: MCU Compute Constraints",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0477",
"title": "The Silent Saturator",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0591",
"title": "The Factory Floor Failure",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0681",
"title": "The Silent Failure of the Emergency Keyword",
"bloom": "create"
},
{
"level": "L6+",
"id": "tinyml-0658",
"title": "The Siren's Screech: Designing a Robust Hearing Aid",
"bloom": "create"
}
],
"rationale": "Walks through the life-cycle of INT8 quantization, starting from basic memory savings, progressing to identifying PTQ calibration failures in noisy environments, and concluding with designing mixed-precision architectures to prevent acoustic overflow.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-05",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0137",
"title": "The 4x Integer Speedup",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0151",
"title": "The Cost of Unoptimized C Code",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0415",
"title": "Inference Cycles on Cortex-M4",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0780",
"title": "Tinyml New 0002",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0603",
"title": "The MCU Throughput Ceiling",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0644",
"title": "The SIMD Lane Starvation",
"bloom": "create"
}
],
"rationale": "Investigates the mechanics of SIMD speedups on microcontrollers, starting from theoretical MAC calculations and unoptimized C baselines, moving to CMSIS-NN practical applications, and culminating in debugging memory-bound SIMD starvation on advanced Cortex-M cores.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-06",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0703",
"title": "Flash Wear from Logging Frequency",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0714",
"title": "MCU Flash Wear Monitoring",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0728",
"title": "The Flash Read Disturb",
"bloom": "evaluate"
}
],
"rationale": "Investigates the long-term reliability of non-volatile memory in TinyML, progressing from calculating basic flash wear endurance to designing circular logs and diagnosing subtle model degradation from Flash Read Disturb.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-07",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0408",
"title": "The Hardware Divider Stall",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0556",
"title": "The Branch Prediction Penalty on MCU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0617",
"title": "The Input-Dependent Watchdog Reset",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0652",
"title": "The Float-to-Int Hardware Trap",
"bloom": "create"
}
],
"rationale": "Explores the hidden latencies of arithmetic and control flow on microcontrollers, from hardware divider stalls and branch mispredictions in activations to input-dependent watchdog resets and floating-point hardware traps.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-08",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0228",
"title": "The TinyML Ridge Point",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0335",
"title": "The TinyML Compute-Memory Tradeoff",
"bloom": "understand"
},
{
"level": "L4",
"id": "tinyml-0711",
"title": "The Cache Miss Penalty",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0597",
"title": "The Instruction Cache Thrashing Loop",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0646",
"title": "The MCU Roofline",
"bloom": "create"
}
],
"rationale": "Analyzes system bottlenecks using the Roofline model, starting from fundamental arithmetic intensity, bridging into empirical cache miss profiling, and finally plotting the MCU roofline for advanced compute architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-009-09",
"track": "tinyml",
"topic": "mcu-compute-constraints",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0296",
"title": "The TinyML SRAM Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0184",
"title": "The TinyML Memory Wall: SRAM vs. Flash",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0716",
"title": "Execute-in-Place vs Copy-to-SRAM for Model Weights",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0724",
"title": "The Execute-in-Place Energy Tax",
"bloom": "evaluate"
}
],
"rationale": "Evaluates the performance and energy trade-offs of Execute-in-Place (XIP) architectures, progressing from basic SRAM versus Flash access latency to caching strategies and system-level battery drain.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-016-06",
"track": "tinyml",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0212",
"title": "The Depthwise Convolution Advantage",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0246",
"title": "The Depthwise Memory Footprint",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0437",
"title": "The Micro-Convolution Budget",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0844",
"title": "Realization: deploy EfficientNet-inspired tiny CNN on Cortex-M4 within 256KB SRAM",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0843",
"title": "Mastery: depthwise separable conv SRAM layout optimization for Cortex-M4",
"bloom": "create"
}
],
"rationale": "Systematically builds understanding of depthwise separable convolutions, from conceptual compute savings and parameter reduction to practical MCU deployment and advanced SRAM layout optimization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-016-07",
"track": "tinyml",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0381",
"title": "The NAS Discovery on a Microcontroller",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-0456",
"title": "The Neural Architecture Search Power Puzzle",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0647",
"title": "The MCU NAS Search Space",
"bloom": "create"
},
{
"level": "L6+",
"id": "tinyml-0649",
"title": "MCUNet Search Space Design",
"bloom": "create"
}
],
"rationale": "Traces the application of Neural Architecture Search for TinyML, starting with basic architectural preferences, evaluating power-constrained candidates, and advancing to custom search space design for MCUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-016-08",
"track": "tinyml",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "tinyml-0584",
"title": "Cortex-M55 + Ethos-U55 + Cortex-A32 \u2014 Which Core Runs What?",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0613",
"title": "NPU Delegation Coverage Determines Actual Speedup",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1207",
"title": "NPU-Aware CNN Design for Ethos-U55",
"bloom": "create"
}
],
"rationale": "Explores the complexities of heterogeneous TinyML deployment, from assigning tasks across distinct compute cores to analyzing NPU delegation coverage and architecting NPU-aware CNNs to prevent SRAM spilling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-019-06",
"track": "tinyml",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1794",
"title": "Microcontroller Event Power",
"bloom": "apply"
},
{
"level": "L2",
"id": "tinyml-0028",
"title": "The Remote Wildlife Camera's Lifespan",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0128",
"title": "The Battery Life Equation",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1721",
"title": "Average Power Calculation in Duty-Cycled Sensors",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0816",
"title": "Wake-Word Duty Cycle Evaluation Under Power Constraints",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1629",
"title": "Cortex-M4 Duty Cycle Budget",
"bloom": "create"
}
],
"rationale": "A complete end-to-end progression on duty cycle battery modeling, from fundamental average power calculations to complex non-linear discharge modeling and architectural power budget optimization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-019-07",
"track": "tinyml",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1662",
"title": "Cortex-M4 Cascade Wakeword",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-1563",
"title": "M4 Boot Energy Dominance",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1724",
"title": "Wake-up Penalty Reduction via Batching",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1261",
"title": "nRF5340 Always-On Wake Word Energy Budgeting",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1745",
"title": "Design a Sub-1mW Cascaded Acoustic Wake-up",
"bloom": "create"
}
],
"rationale": "Focuses on the wake-up penalty in duty-cycled systems, progressing from cascade power calculation and boot energy dominance to batching mitigations and full sub-1mW cascaded architecture design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-019-08",
"track": "tinyml",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0046",
"title": "The Energy-Neutral Wildlife Camera",
"bloom": "understand"
},
{
"level": "L4",
"id": "tinyml-0587",
"title": "Duty Cycle for Energy Harvesting Budget",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0607",
"title": "The Energy Harvesting Inference Budget",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0124",
"title": "The Solar Harvesting Budget",
"bloom": "create"
}
],
"rationale": "Explores energy harvesting, moving from basic energy-neutral solar calculations to vibration inference budgets, supercapacitor modeling under varying conditions, and battery-free system design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-024-10",
"track": "tinyml",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0141",
"title": "The 1 Millisecond Deadline",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0258",
"title": "The Audio Buffer Deadline",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-1071",
"title": "Interrupt-Driven Missed Audio Deadlines",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-0004",
"title": "The CPU Cycle Thief",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0111",
"title": "The Interrupt Latency Impact",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1350",
"title": "Real-Time TinyML Inference on ESP32-S3: Diagnosing Latency and Jank",
"bloom": "analyze"
}
],
"rationale": "Traces the impact of hardware interrupts on strict audio/KWS deadlines, starting from fundamental budget concepts to diagnosing specific ISR cycle-stealing and finally restructuring the system to mitigate interrupt jitter.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-024-11",
"track": "tinyml",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0202",
"title": "The Real-Time Deadline Trap",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-0412",
"title": "The Watchdog Reset During Inference",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0100",
"title": "Watchdog Timer Integration with Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0601",
"title": "The Watchdog Interrupt Starvation",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0635",
"title": "The Silent Sensor Death Spiral",
"bloom": "evaluate"
}
],
"rationale": "Explores the conflict between long-running ML inferences and system watchdog timers, progressing from deadline overlap traps to diagnosing random watchdog resets, resolving ISR starvation, and investigating fleet-wide death spirals.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-024-12",
"track": "tinyml",
"topic": "real-time-deadlines",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0475",
"title": "The Overwhelmed Sensor Fusion MCU",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1305",
"title": "Acoustic Pipeline Scheduling on Dual-Core MCU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1263",
"title": "Keyword Spotting Deadlines on ESP32-S3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1202",
"title": "Dual-Core Real-Time Inference Architecture",
"bloom": "create"
}
],
"rationale": "Addresses the challenge of meeting real-time deadlines while handling concurrent tasks like BLE streaming, progressing from identifying overloaded single-core MCU limits to partitioning tasks and designing robust dual-core architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-025-04",
"track": "tinyml",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0205",
"title": "The Privacy-First Principle of Federated Learning",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0364",
"title": "The Economics of Fleet Updates: Centralized vs. Federated",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0537",
"title": "The Smart Doorbell's Update Dilemma",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0636",
"title": "The Federated Learning Battery Drain Catastrophe",
"bloom": "evaluate"
}
],
"rationale": "Builds foundational intuition for the economics of federated learning, progressing from privacy concepts to data transfer calculations, total cost of ownership tradeoffs, and finally diagnosing battery drain catastrophes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-025-05",
"track": "tinyml",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0144",
"title": "The Nanosecond Heist",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0097",
"title": "Side-Channel Attacks on MCU Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0073",
"title": "Secure Boot Chain for ML Models",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0684",
"title": "The Billion-Dollar Doorbell Breach",
"bloom": "create"
}
],
"rationale": "Focuses on the security vulnerabilities of edge ML, escalating from understanding side-channel resolutions to executing power attacks, designing secure boot mitigations, and responding to fleet-wide breaches.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-025-06",
"track": "tinyml",
"topic": "federated-learning",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "tinyml-1471",
"title": "Federated Learning on ESP32-S3: Convergence and Memory Challenges",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1242",
"title": "Federated Averaging Memory and Communication Sizing on ESP32-S3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1475",
"title": "Federated Learning on ESP32-S3: Scaling a TinyML Model for Non-IID Data",
"bloom": "analyze"
}
],
"rationale": "Examines the memory and convergence challenges of federated averaging on ESP32 microcontrollers, progressing from diagnosing OOMs to sizing communication buffers and architecting for non-IID data at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-01",
"track": "tinyml",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1270",
"title": "ARM Cortex-M4 SRAM and Flash Capacities",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-1646",
"title": "MCU Weight Storage Hierarchy",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-1797",
"title": "Cortex-M4 Flash vs SRAM execution",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1688",
"title": "TinyML Memory Paging Analysis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1246",
"title": "Evaluating Flash vs SRAM Execution for Cortex-M4 Inference",
"bloom": "evaluate"
}
],
"rationale": "Progresses from knowing hardware capacities to understanding how the memory hierarchy dictates weight storage, calculating latency differences between Flash and SRAM, implementing paging algorithms, and finally evaluating architectural tradeoffs of XIP versus DMA paging.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-02",
"track": "tinyml",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1326",
"title": "ESP32-S3 Memory Hierarchy: Capacity, Bandwidth, Latency Tradeoffs",
"bloom": "analyze"
},
{
"level": "L3",
"id": "tinyml-1062",
"title": "ESP32-S3 Inference Latency Anomaly Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1328",
"title": "ESP32-S3 Memory Bottleneck Diagnosis",
"bloom": "analyze"
}
],
"rationale": "Teaches the characteristics of ESP32-S3 dual-tier memory, quantifies how external PSRAM affects inference latency in practice, and challenges the learner to diagnose dynamic system bottlenecks arising from PSRAM bus contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-03",
"track": "tinyml",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1044",
"title": "SRAM Capacity Limits on Corstone-300",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1502",
"title": "SRAM Tiling Strategy for Large Conv Layer on Cortex-M7+Ethos-U55",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1088",
"title": "Corstone-300 Memory Allocation for KWS",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1193",
"title": "Ethos-U55 Memory Hierarchy Design",
"bloom": "create"
}
],
"rationale": "Follows the spatial and architectural constraints of the Corstone-300 Ethos-U55 platform, from basic capacity limits to advanced tiling strategies, static memory allocation tradeoffs, and dynamic weight streaming designs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-04",
"track": "tinyml",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1169",
"title": "SRAM Allocation and Peak Memory Calculation on nRF5340",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1132",
"title": "XIP vs DMA Paging on nRF5340",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1332",
"title": "TinyML Memory Hierarchy Optimization on Nordic nRF5340",
"bloom": "analyze"
}
],
"rationale": "Explores the nRF5340 memory architecture, starting with baseline SRAM capacity budgeting, stepping up to XIP/DMA tradeoffs, and culminating in full cross-hierarchy placement of all application components.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-16",
"track": "tinyml",
"topic": "tensor-arena-planning",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0166",
"title": "The SRAM Budget Constraint",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0254",
"title": "The SRAM Tensor Arena Squeeze",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0981",
"title": "Operator Scheduling for Peak SRAM Reduction",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1519",
"title": "Tensor Lifetime Analysis for U-Net on MCU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1522",
"title": "Memory-Optimal Operator Execution Order",
"bloom": "evaluate"
}
],
"rationale": "Follows the core principles of tensor arena overlap, from simple SRAM constraints and co-existing sequential tensors, to understanding how operator execution schedules alter peak bounds, analyzing long-lived skip connections, and formally deriving memory-optimal DAG schedules.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-17",
"track": "tinyml",
"topic": "tensor-arena-planning",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1150",
"title": "ESP32-S3 Tensor Arena Sizing with WiFi",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1524",
"title": "PSRAM Spilling Strategy for ESP32-S3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1101",
"title": "ESP32-S3 Dual-Model Tensor Arena Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1210",
"title": "Cross-Hierarchy Tensor Arena for ESP32-S3",
"bloom": "create"
}
],
"rationale": "Focuses on the unique architectural challenge of the ESP32-S3's dual-tier memory, from calculating SRAM requirements with WiFi overhead, evaluating PSRAM latency spilling strategies, mapping dual-model arenas, and developing a custom cross-hierarchy static planner.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-026-18",
"track": "tinyml",
"topic": "tensor-arena-planning",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0185",
"title": "The SRAM Tensor Arena Puzzle",
"bloom": "understand"
},
{
"level": "L4",
"id": "tinyml-1309",
"title": "Designing a Static Tensor Arena for Wake-Word CNN",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1525",
"title": "Arena Fragmentation in Dynamic TinyML Workloads",
"bloom": "create"
},
{
"level": "L6+",
"id": "tinyml-0675",
"title": "The Concurrent Wake-Word Crisis",
"bloom": "create"
}
],
"rationale": "Explores managing memory across multiple distinct TinyML workloads, starting from baseline peak concurrent overlap, moving to static allocation for distinct states, dealing with arena fragmentation when hot-swapping models, and designing schedulers for concurrent overlapping model residency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-01",
"track": "tinyml",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0868",
"title": "Model Format Conversion: Recall TFLite Micro Supported Op List and Limitations",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0873",
"title": "Model Format Conversion: Size TFLM Model for Cortex-M4 Flash and SRAM",
"bloom": "apply"
},
{
"level": "L3",
"id": "tinyml-1429",
"title": "Optimizing and Deploying a Quantized Model on ARM Cortex-M4 STM32F4",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1009",
"title": "Diagnosing Latency Spikes from Unoptimized Fallbacks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1247",
"title": "TFLite Micro Operator Fallback on Cortex-M4",
"bloom": "evaluate"
}
],
"rationale": "Progresses from recalling supported operations and basic sizing to deploying, diagnosing latency spikes from fallbacks, and handling reference kernel fallbacks on Cortex-M4.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-02",
"track": "tinyml",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1156",
"title": "Ethos-U55 Compiler Fallback Latency",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0992",
"title": "Ethos-U55 Fallback Delegation and SRAM Strategy",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1194",
"title": "Ethos-U55 Delegation and Shared SRAM Optimization",
"bloom": "create"
}
],
"rationale": "Advances from calculating Ethos-U55 fallback latency to designing fallback delegation and SRAM optimization strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-03",
"track": "tinyml",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1427",
"title": "Model Conversion and Deployment on ESP32-S3 with TFLite Micro",
"bloom": "analyze"
},
{
"level": "L3",
"id": "tinyml-0967",
"title": "ESP32-S3 Quantized Model Fallback Latency",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1220",
"title": "ESP32-S3 Operator Fallback Optimization",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1433",
"title": "Optimizing ONNX Model Conversion for ESP32-S3 Edge Deployment",
"bloom": "analyze"
}
],
"rationale": "Takes the learner from basic ESP32-S3 deployment to diagnosing unoptimized vector kernel fallbacks, optimizing operators, and finally converting complete ONNX pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-04",
"track": "tinyml",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1170",
"title": "TFLite Micro Memory Allocation on nRF5340",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1025",
"title": "TFLite Micro vs AOT Compilation on nRF5340",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1434",
"title": "TinyML Model Conversion and Operator Gap Management for Nordic nRF5340 Deployment",
"bloom": "analyze"
}
],
"rationale": "Builds from memory allocation on nRF5340 to evaluating AOT compilation tradeoffs and managing full operator gaps during deployment.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-19",
"track": "tinyml",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1037",
"title": "Active Learning Storage Constraints",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1455",
"title": "Optimizing TinyML Data Labeling for nRF5340 Resource Constraints",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0985",
"title": "Active Learning Data Pipeline for nRF5340 Anomaly Detection",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1184",
"title": "On-Device Active Learning for Audio",
"bloom": "create"
}
],
"rationale": "Progresses from calculating basic storage constraints for audio clips to optimizing data labeling under BLE limits, designing a full anomaly detection pipeline, and architecting the complete active learning system for the nRF5340.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-20",
"track": "tinyml",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0849",
"title": "Dataset Curation: Implement Feature Extraction Pipeline for Cortex-M4 Deployment",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0853",
"title": "Dataset Curation: Optimize Dataset Size vs Model Accuracy on Cortex-M4",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "tinyml-1125",
"title": "On-Device Active Learning Curation for STM32F4",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1463",
"title": "TinyML Anomaly Detection: Data Curation for Constrained Devices",
"bloom": "analyze"
}
],
"rationale": "Moves from implementing fixed-point feature extraction to optimizing dataset size versus accuracy cost, evaluating active learning strategies on STM32F4, and finalizing data curation for FPU-less anomaly models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-21",
"track": "tinyml",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1454",
"title": "Mitigating Dataset Bias in TinyML Gesture Recognition on Cortex-M7/Ethos-U55",
"bloom": "analyze"
},
{
"level": "L3",
"id": "tinyml-1053",
"title": "Calibration Dataset Outlier Bias",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1289",
"title": "Dataset Specification for Constrained INT8 Acoustic Models",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1461",
"title": "Edge AI Keyword Spotting: Active Learning for Constrained Dataset Curation",
"bloom": "analyze"
}
],
"rationale": "Explores dataset constraints for the Ethos-U55 NPU, moving from basic bias mitigation to diagnosing outlier bias during calibration, specifying INT8 datasets, and applying active learning to curate rare keywords.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "tinyml-chain-auto-027-22",
"track": "tinyml",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1266",
"title": "Data Type Formatting for ESP32-S3 Hardware Acceleration",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-0855",
"title": "Dataset Curation: Realize PSRAM-Constrained Training Dataset for ESP32-S3",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1003",
"title": "Diagnosing Domain Shift in ESP32-S3 Wake-Word Datasets",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1238",
"title": "On-Device Hard Negative Mining for ESP32-S3",
"bloom": "evaluate"
}
],
"rationale": "Takes the learner from understanding INT8 formatting to managing PSRAM limits, diagnosing domain shift in wake-word datasets, and designing on-device hard negative mining for the ESP32-S3.",
"_origin": "gemini-3.1-pro-preview",
"tier": "primary"
},
{
"chain_id": "cloud-chain-auto-secondary-001-01",
"track": "cloud",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-0286",
"title": "The Roofline Litmus Test",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0369",
"title": "The H100 Ridge Point",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-1808",
"title": "Roofline Model Analysis on A100",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3240",
"title": "Diagnosing Workload Bottlenecks on NVIDIA H100 with Roofline Analysis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3721",
"title": "Compute-Optimal Scaling Laws Verification with Roofline on A100",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3241",
"title": "Optimizing Large Language Model Inference on AMD MI300X with Roofline Analysis",
"bloom": "analyze"
}
],
"rationale": "A complete progression from understanding the roofline metric to optimizing a production LLM workload on an MI300X.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-001-02",
"track": "cloud",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "cloud-0277",
"title": "Defining Arithmetic Intensity: Roofline Analysis",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-2109",
"title": "The Matrix Multiply FLOP Count",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2110",
"title": "Arithmetic Intensity of a Linear Layer",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3702",
"title": "Roofline Ceiling Identification on H100 for Transformer MLP Blocks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1527",
"title": "Evaluating GPU Instances for High-Batch Dense FFNs",
"bloom": "evaluate"
}
],
"rationale": "Calculates and applies arithmetic intensity for dense matrix operations scaling from a single linear layer to evaluating GPU instances.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-001-03",
"track": "cloud",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2696",
"title": "Why GPUDirect Storage Eliminates the CPU Bounce Buffer",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2697",
"title": "Data Loading Bottleneck Analysis",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2698",
"title": "Storage Architecture for Large-Scale Training",
"bloom": "evaluate"
}
],
"rationale": "Explores the architectural and performance implications of GPUDirect Storage on data loading bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-002-05",
"track": "cloud",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2902",
"title": "Recall depthwise separable convolution FLOP formula",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-2918",
"title": "Fluency: explain depthwise separable convolution to a non-ML systems engineer",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2891",
"title": "Analyze depthwise separable convolution parameter reduction on ResNet-50 baseline",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2906",
"title": "Implement custom depthwise conv CUDA kernel with shared memory optimization",
"bloom": "apply"
}
],
"rationale": "Guides the learner from the basic mathematical definition of depthwise separable convolutions to analyzing their impact on standard CNNs and finally optimizing their CUDA implementations on H100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-002-06",
"track": "cloud",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2904",
"title": "Recall EfficientNet compound scaling constraints and baseline architecture",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-2892",
"title": "Analyze EfficientNet compound scaling vs. naive width/depth scaling on throughput",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2901",
"title": "Evaluate EfficientNet compound scaling coefficient impact on H100 memory bandwidth",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2916",
"title": "Realize automated NAS-style width and depth search for EfficientNet-style backbone",
"bloom": "apply"
}
],
"rationale": "Traces the concept of EfficientNet scaling from basic definitions through comparative throughput analysis, memory bandwidth evaluation, and custom NAS search.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-004-22",
"track": "cloud",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "cloud-2837",
"title": "Cloud New 0031",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-2987",
"title": "Kernel Fusion: Recall \u2014 What is Operator Fusion and Why Does It Matter?",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3842",
"title": "Analyzing Memory Bounds in Unfused Operations",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3386",
"title": "H100 Memory Bottleneck in LLM Element-wise Operations",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3388",
"title": "Optimizing Memory-Bound Operations via Kernel Fusion",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3391",
"title": "H100 LLM Inference Optimization: Kernel Fusion for Memory-Bound Operations",
"bloom": "analyze"
}
],
"rationale": "Establishes fundamental knowledge of operator fusion, analyzes memory limits of unfused operations, diagnoses H100 element-wise bottlenecks, and scales up to implementing and validating fusion for LLM inference on H100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-004-25",
"track": "cloud",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3389",
"title": "TPU Kernel Fusion for Memory-Bound Element-wise Operations",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3387",
"title": "Optimizing Transformer Inference on Google TPU v5e via Kernel Fusion",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3390",
"title": "Optimizing LLM Inference on AMD MI300X via Kernel Fusion",
"bloom": "analyze"
}
],
"rationale": "Bridges from basic TPU kernel fusion for simple element-wise operations to optimizing full transformer inference on TPU v5e, and extending similar principles to maximize HBM3 bandwidth on AMD MI300X.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-004-26",
"track": "cloud",
"topic": "model-adaptation-systems",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4277",
"title": "LoRA Rank Selection and Memory Budget",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4424",
"title": "LoRA Rank Sensitivity to Task Complexity",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-4437",
"title": "LoRA Adapter Distillation for Latency-Sensitive Serving",
"bloom": "create"
}
],
"rationale": "Progresses from basic memory budgeting of LoRA ranks, to analyzing sensitivity and diminishing returns based on task complexity, and culminates in distilling high-rank adapters into lower ranks for latency-sensitive serving.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-01",
"track": "cloud",
"topic": "software-portability",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3910",
"title": "HIP Kernel Launch Parameter Translation from CUDA",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3900",
"title": "Warp Size Divergence When Porting CUDA Kernels to AMD CDNA",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3896",
"title": "Triton Kernel Portability from NVIDIA to AMD GPUs",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3909",
"title": "CI/CD Pipeline Design for Multi-Accelerator Kernel Testing",
"bloom": "create"
}
],
"rationale": "Progresses from basic HIP launch translation, to handling warp size differences in AMD, to high-level Triton portability, and finally testing custom kernels across platforms in CI/CD.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-02",
"track": "cloud",
"topic": "software-portability",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3904",
"title": "ONNX Model Compatibility Matrix Across Runtime Versions",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-3895",
"title": "ONNX Runtime Execution Provider Selection for Multi-Accelerator Inference",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3899",
"title": "ONNX Graph Partitioning Across Mixed Execution Providers",
"bloom": "apply"
}
],
"rationale": "Explores ONNX compatibility at load time, progresses to execution provider selection, and resolves advanced graph partitioning challenges across mixed providers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-03",
"track": "cloud",
"topic": "chiplet-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4261",
"title": "MI300X XCD Count and Die Topology",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-4264",
"title": "NUMA Effects in MI300X Multi-XCD Workloads",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4270",
"title": "Model Parallelism Partitioning for XCD Locality",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-4274",
"title": "Infinity Fabric Topology and All-Reduce Efficiency",
"bloom": "create"
}
],
"rationale": "Progresses from understanding the basic MI300X chiplet topology to identifying NUMA effects, partitioning models for locality, and finally designing an optimal all-reduce ring topology across the dies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-04",
"track": "cloud",
"topic": "chiplet-architecture",
"competency_area": "compute",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-4263",
"title": "Yield-Performance Tradeoff in Chiplet Design",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4420",
"title": "Multi-Die Power Delivery Network Design for AI Chiplet",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-4414",
"title": "Active Interposer vs Passive Silicon Interposer Tradeoffs",
"bloom": "evaluate"
}
],
"rationale": "Moves from foundational yield vs. monolithic cost tradeoffs to addressing power delivery challenges in chiplets, up to selecting interposer technologies for next-generation architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-05",
"track": "cloud",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1100",
"title": "AOT Dataset Compilation for H100 GPU Clusters",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1103",
"title": "Diagnosing Data Loader Bottlenecks in Vision Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1105",
"title": "Evaluating Data Compilation Strategies",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating decoding requirements for vision data, to diagnosing an active data loader bottleneck, and finally evaluating node scaling versus offline compilation for massive multimodal training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-06",
"track": "cloud",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2968",
"title": "Dataset Curation: Recall \u2014 What is Perplexity-Based Data Filtering?",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-2960",
"title": "Dataset Curation: Implement a Data Quality Scoring Function",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2956",
"title": "Dataset Curation: Evaluate Deduplication Strategies for Pre-Training Data Quality",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2961",
"title": "Dataset Curation: Mastery \u2014 End-to-End Pre-Training Data Strategy for 100B Model",
"bloom": "create"
}
],
"rationale": "Builds from the concept of perplexity filtering to implementing a scorer, evaluating deduplication tradeoffs, and finally architecting a complete multi-stage pre-training data strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-07",
"track": "cloud",
"topic": "model-size-estimation",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3032",
"title": "Model Size Estimation: Fluency \\u2014 Estimate 7B Model Memory From Scratch",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3035",
"title": "Model Size Estimation: Diagnose OOM Error During Fine-Tuning",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3033",
"title": "Model Size Estimation: Master Full Memory Budget for LLM Training on H100",
"bloom": "evaluate"
}
],
"rationale": "Progresses from napkin math for a 7B model's inference memory, to diagnosing training OOMs for a 13B model, up to mastering the full multi-component memory budget for distributed LLM training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-005-08",
"track": "cloud",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-1596",
"title": "Diagnosing Dynamic Ensemble Latency on A10G",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1598",
"title": "Evaluating Dynamic Ensembles for Fraud Detection",
"bloom": "evaluate"
}
],
"rationale": "Moves from diagnosing latency spikes in a single-GPU dynamic routing ensemble to architecting a strict P99 multi-model serving system for fraud detection.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-007-01",
"track": "cloud",
"topic": "storage-format-selection",
"competency_area": "data",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1582",
"title": "Multi-Tier Pre-fetch Sizing for 3D ViT",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1581",
"title": "Diagnosing NVMe Cache Thrashing in CV Training",
"bloom": "analyze"
}
],
"rationale": "Progresses from calculating baseline prefetch math for an S3/NVMe tier to diagnosing thrashing when using that same NVMe cache for random accesses.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-007-02",
"track": "cloud",
"topic": "storage-format-selection",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3890",
"title": "Parquet Storage and Pipeline Bottleneck Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4220",
"title": "Parquet Row Group Sizing for ML Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4221",
"title": "Delta Lake vs Parquet for ML Versioning",
"bloom": "evaluate"
}
],
"rationale": "Starts with calculating Parquet compression/throughput limits, moves to diagnosing row-group sizing issues during read, and culminates in evaluating Delta Lake to solve concurrency/versioning over raw Parquet.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-007-03",
"track": "cloud",
"topic": "storage-format-selection",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3554",
"title": "H100 Data Ingestion: Optimizing Storage for Large-Scale ML",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3552",
"title": "Optimizing Data Ingestion for H100-Powered Foundation Model Training",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3557",
"title": "Optimizing Large-Scale Model Training Data I/O for NVIDIA H100 Clusters",
"bloom": "analyze"
}
],
"rationale": "Progresses from redesigning a generic image dataset format, to solving JSON ingestion bottlenecks for multimodal, to architecting a complete petabyte-scale multimodal storage tiering system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-007-16",
"track": "cloud",
"topic": "autograd-computational-graphs",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4239",
"title": "Understanding why torch.compile retraces after dynamic batch size changes",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-4237",
"title": "Debugging a graph break in torch.compile during transformer training",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-4259",
"title": "Understanding graph break costs in torch.compile for a custom CUDA extension",
"bloom": "evaluate"
}
],
"rationale": "Progresses from understanding dynamic batch retracing stalls to debugging explicit graph breaks from dynamic branching, and finally fixing graph breaks across a custom C++ CUDA extension boundary.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-007-17",
"track": "cloud",
"topic": "autograd-computational-graphs",
"competency_area": "optimization",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-4252",
"title": "Diagnosing NaN gradients in a deep network with custom autograd operations",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-4238",
"title": "Implementing a numerically stable custom backward pass for a fused loss function",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-4241",
"title": "Designing a custom autograd function for a differentiable rendering operation",
"bloom": "create"
}
],
"rationale": "Moves from diagnosing numerical instability in a custom backward pass to implementing a stable fused backward pass from scratch, and finally wrapping a complex custom H100 CUDA kernel for a differentiable NeRF into autograd.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-007-18",
"track": "cloud",
"topic": "autograd-computational-graphs",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-4247",
"title": "Understanding the autograd graph lifecycle and preventing memory leaks",
"bloom": "understand"
},
{
"level": "L4",
"id": "cloud-4236",
"title": "Estimating activation memory for backward pass on H100 with large batch",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4240",
"title": "Quantifying activation checkpointing trade-off for LLM pretraining on MI300X",
"bloom": "evaluate"
}
],
"rationale": "Progresses from diagnosing simple memory leaks in the autograd graph to statically estimating activation memory for large-batch models, and evaluating the exact memory-compute trade-offs of activation checkpointing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-12",
"track": "cloud",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3574",
"title": "DP-SGD Scaling on NVIDIA H100: Balancing Privacy and Performance",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3571",
"title": "Designing a DP-SGD System with NVIDIA H100 for Federated Learning",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3577",
"title": "Optimizing DP-SGD on NVIDIA H100 for Federated Learning",
"bloom": "analyze"
}
],
"rationale": "Guides the user from analyzing the basic impact of DP-SGD scaling and noise on H100 throughput to designing a federated DP-SGD architecture and calibrating the system-wide privacy budget for medical models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-13",
"track": "cloud",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3569",
"title": "DP-SGD and Privacy Budget on AMD MI300X",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-3573",
"title": "DP-SGD Misconfiguration on AMD MI300X: Utility Drop & Rapid Privacy Budget Consumption Diagnosis",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3576",
"title": "DP-SGD Deployment on AMD MI300X: Budgeting Epsilon and Performance Impact",
"bloom": "analyze"
}
],
"rationale": "Progresses from understanding basic epsilon effects on the MI300X, to diagnosing a misconfiguration causing rapid budget consumption, and finally deploying a 70B LLM with stringent privacy constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-14",
"track": "cloud",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3572",
"title": "Optimizing DP-SGD on Google TPU v5e with Privacy Budget Constraints",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3575",
"title": "DP-SGD Model Evaluation on Google TPU v5e: Optimizing Privacy-Utility Tradeoffs for Federated Learning",
"bloom": "analyze"
}
],
"rationale": "Moves from calibrating noise scales for a given privacy budget on TPU v5e to evaluating the complex tradeoffs of client-side versus server-side DP-SGD in a federated medical imaging context.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-15",
"track": "cloud",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1209",
"title": "LLM Inference Energy Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1212",
"title": "Root-Causing Memory Power in LLM Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1214",
"title": "LLM Inference Energy Movement Bottleneck",
"bloom": "evaluate"
}
],
"rationale": "Explores the fundamental shift in LLM inference energy consumption from ALU operations to HBM memory movement, progressing from theoretical calculation to root-cause diagnosis and architectural strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-16",
"track": "cloud",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-1205",
"title": "Non-Linear Power Scaling in LLM Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2504",
"title": "LLM Serving Batch Size Energy",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-2397",
"title": "Minimizing Serving Cost per Token",
"bloom": "create"
}
],
"rationale": "Investigates how batch size affects non-linear power draw during LLM serving, applying this to latency SLA constraints, and finally optimizing the TCO per token across different GPU generations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-17",
"track": "cloud",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2786",
"title": "The Divergence Problem",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2787",
"title": "Energy Gap Projection",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2788",
"title": "Strategies Against Divergent Scaling",
"bloom": "evaluate"
}
],
"rationale": "A progression exploring the macro-level sustainability of AI compute, starting with Moore's law limitations, projecting future datacenter energy gaps, and evaluating strategic mitigations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-009-18",
"track": "cloud",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3340",
"title": "Optimizing LLM Inference Energy on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3341",
"title": "Energy-Aware MoE Inference on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Analyzes energy inefficiencies of running LLM inference on the MI300X and applies energy-aware operator selection to optimize a large sparse MoE model within a power envelope.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-010-21",
"track": "cloud",
"topic": "gpu-compute-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3838",
"title": "Uncoalesced Memory Access in Gather Kernel",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-4190",
"title": "Memory Coalescing in Attention Kernels",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-4193",
"title": "Kernel Fusion Strategy for Transformer Blocks",
"bloom": "create"
}
],
"rationale": "Analyzes memory access patterns in GPU kernels, from identifying uncoalesced accesses to resolving coalescing issues in attention kernels and designing optimal fusion strategies for transformer blocks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-010-22",
"track": "cloud",
"topic": "gpu-compute-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4187",
"title": "Tensor Core Utilization vs CUDA Core Fallback",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4189",
"title": "Tensor Core Matrix Multiply Tile Sizes",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-3242",
"title": "Optimizing Large Transformer Inference on NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3244",
"title": "Optimizing Large Language Model Inference on NVIDIA H100 for High-Throughput",
"bloom": "analyze"
}
],
"rationale": "Teaches GPU Tensor Core utilization, starting from profiling CUDA core fallbacks and calculating tile sizes, to optimizing full transformer inference and scaling throughput on H100 GPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-010-23",
"track": "cloud",
"topic": "systolic-dataflow",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3123",
"title": "Recall Systolic Array Weight-Stationary Dataflow",
"bloom": "remember"
},
{
"level": "L3",
"id": "cloud-3088",
"title": "Systolic Array Fluency: Arithmetic Intensity from Memory",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3086",
"title": "Systolic Array Evaluation: TPU v5e vs H100 for Training Transformers",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3084",
"title": "Systolic Array Design: Weight-Stationary vs Output-Stationary Tradeoff",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3090",
"title": "Systolic Array Mastery: Full Transformer Layer Analysis on TPU",
"bloom": "create"
}
],
"rationale": "Builds foundational knowledge of systolic arrays, progressing from weight-stationary dataflow concepts and arithmetic intensity to comparative roofline analysis, dataflow tradeoffs, and full transformer layer optimization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-010-24",
"track": "cloud",
"topic": "systolic-dataflow",
"competency_area": "compute",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3124",
"title": "Diagnose Systolic Array Underutilization for Non-Square Matrices",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3095",
"title": "Systolic Array Specification: Design for 99% Compute Utilization on TPU",
"bloom": "create"
},
{
"level": "L6+",
"id": "cloud-3091",
"title": "Systolic Array Mastery: Roofline Analysis for Custom LLM Kernel",
"bloom": "evaluate"
}
],
"rationale": "Investigates hardware utilization on TPUs, from diagnosing underutilization with batch size 1 to sizing batches for 99% compute efficiency and optimizing custom fused LLM decoding kernels.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-011-22",
"track": "cloud",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1705",
"title": "4-Bit PTQ Memory Footprint for LLMs",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2328",
"title": "Activation-Aware Weight Quantization (AWQ)",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1708",
"title": "Evaluating PTQ Strategies for 70B LLM Serving",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2285",
"title": "The Full-Stack Compression Audit",
"bloom": "create"
}
],
"rationale": "Progresses from calculating the memory footprint of 4-bit AWQ to understanding its accuracy recovery mechanisms, evaluating PTQ serving trade-offs, and finally auditing a complex multi-stage LLM compression pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-011-23",
"track": "cloud",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0921",
"title": "Calculate Memory and Compute Savings for a BNN",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-0922",
"title": "Evaluating BNNs for Cloud-Scale Filtering",
"bloom": "evaluate"
}
],
"rationale": "Moves from mathematical computation of memory and operations for a Binary Neural Network to evaluating the architectural trade-offs of deploying BNNs on FPGAs for massive scale filtering.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-011-24",
"track": "cloud",
"topic": "extreme-quantization",
"competency_area": "memory",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-0896",
"title": "Diagnosing Low SM Utilization in LLM Decoding",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2426",
"title": "W4A16 Batch-Size Latency Inversion",
"bloom": "analyze"
}
],
"rationale": "Explores the complex latency dynamics of memory-bound LLM serving, progressing from diagnosing basic low SM utilization to understanding why INT4 quantization improves batch size 1 latency but degrades large batch latency due to compute bounds.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-013-03",
"track": "cloud",
"topic": "attention-scaling",
"competency_area": "architecture",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "cloud-0267",
"title": "The Chinchilla Data-Compute Ratio",
"bloom": "remember"
},
{
"level": "L2",
"id": "cloud-0268",
"title": "The Chinchilla Data Budget",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-0452",
"title": "The Startup's Scaling Dilemma",
"bloom": "apply"
}
],
"rationale": "Progresses from recalling the optimal token ratio for a 70B model, to calculating the exact data budget, and finally resolving a startup's compute/data constraint dilemma under Chinchilla scaling laws.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-013-04",
"track": "cloud",
"topic": "attention-scaling",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-2082",
"title": "Attention's Quadratic Memory Wall",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3354",
"title": "Scaling Attention for Long Contexts",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-0231",
"title": "The KV-Cache Context Explosion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3355",
"title": "Optimizing LLM Context with Attention Scaling on NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3357",
"title": "Scaling Long-Context Attention on NVIDIA A100 for LLM Inference",
"bloom": "analyze"
}
],
"rationale": "Teaches the progression of memory bottlenecks in long-context attention, starting from basic quadratic growth, exploring 128K context constraints conceptually, diagnosing the KV-cache explosion, evaluating MQA/GQA on hardware, and designing a comprehensive 128K serving system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-013-05",
"track": "cloud",
"topic": "attention-scaling",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2089",
"title": "Flash Attention Tiling Strategy",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-2100",
"title": "Sparse Attention Patterns Meet Hardware Reality",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "cloud-3741",
"title": "Dynamic Sparse Attention Pattern Implementation on H100 for Code LLM",
"bloom": "create"
}
],
"rationale": "Examines the hardware realities of attention mechanisms, moving from the core systems insight of Flash Attention's SRAM tiling to diagnosing why theoretical FLOP reductions in sparse attention don't yield proportional speedups, and finally implementing dynamic sparse attention patterns.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-013-30",
"track": "cloud",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1512",
"title": "L7 Load Balancing for Speech AI",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1513",
"title": "Diagnosing P99 Spikes in LLM Fleets",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1514",
"title": "Disaggregated Load Balancing for LLMs",
"bloom": "evaluate"
}
],
"rationale": "Analyzes Layer 7 load balancing for AI fleets, moving from sizing ingress for speech streams, diagnosing round-robin P99 spikes despite low GPU utilization, and architecting a disaggregated prefill/decode routing layer.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-013-31",
"track": "cloud",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-0857",
"title": "Sizing RoCEv2 Buffers for Adaptive Routing",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0852",
"title": "MoE All-to-All Network Load Imbalance",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0853",
"title": "Routing MoE All-to-All Bursts at Scale",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2367",
"title": "ECMP vs Adaptive Routing for Elephant Flows",
"bloom": "evaluate"
}
],
"rationale": "Explores network-level load balancing for training MoE models, progressing from sizing RoCEv2 buffers, diagnosing ECMP hash collisions, evaluating adaptive routing at 32k GPU scale, and deciding on routing for elephant flows.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-013-32",
"track": "cloud",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3467",
"title": "TPU v5e Inference Load Balancing Basics",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-3474",
"title": "Optimizing Inference Routing for High-Throughput TPU v5e Farms",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3470",
"title": "LLM Inference Load Balancing on TPU Fleet",
"bloom": "analyze"
}
],
"rationale": "Covers load balancing specifically for TPU v5e inference fleets, starting with the basics of distributing variable requests, optimizing adaptive routing to fix underutilization, and routing 50,000 QPS to meet strict latency SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-014-28",
"track": "cloud",
"topic": "memory-pressure-management",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-1555",
"title": "Calculating Baseline Training Memory Pressure",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1559",
"title": "Diagnosing OOM in 7B LLM Full Fine-Tuning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3290",
"title": "LLM Memory Pressure Management on NVIDIA H100 Cluster",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3293",
"title": "LLM Fine-tuning OOM on NVIDIA H100: Diagnosing and Mitigating Memory Pressure",
"bloom": "analyze"
}
],
"rationale": "Moves from calculating baseline LLM training footprint to diagnosing single-GPU OOMs, and scales up to managing distributed H100 fragmentation and offloading.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-014-30",
"track": "cloud",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-0855",
"title": "Diagnosing Adversarial Debiasing Instability in NLP",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0854",
"title": "Evaluating Adversarial Debiasing Dynamics in Credit Models",
"bloom": "evaluate"
}
],
"rationale": "Investigates adversarial debiasing, starting from diagnosing training instability to evaluating the dynamics of gradient reversal layers on model throughput.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-014-31",
"track": "cloud",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3578",
"title": "Fairness Metric Definitions on NVIDIA A100",
"bloom": "analyze"
},
{
"level": "L3",
"id": "cloud-3882",
"title": "Compute Equal Opportunity Difference and Memory Read Latency",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3581",
"title": "Diagnosing Bias in a Large-Scale Model Deployed on NVIDIA A100 for Demographic Parity Failures",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-4345",
"title": "Intersectional Fairness Under Distribution Shift",
"bloom": "evaluate"
}
],
"rationale": "Covers fairness metrics, moving from defining equalized odds to computing differences, diagnosing demographic parity gaps, and addressing distribution shifts.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-014-32",
"track": "cloud",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-2288",
"title": "The Fairness Monitoring Compute Budget",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-2290",
"title": "The Intersectional Subgroup Explosion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3580",
"title": "Architecting a Real-time Fairness Evaluation System on Google TPU",
"bloom": "analyze"
}
],
"rationale": "Focuses on the computational cost of fairness audits, progressing from hourly compute budgets to intersectional subgroup explosions and real-time TPU monitoring.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-01",
"track": "cloud",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-3886",
"title": "Calculate Mixed-Precision Memory Footprint for Adam Training",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1573",
"title": "Diagnosing Mixed-Precision OOM Failures",
"bloom": "analyze"
}
],
"rationale": "Guides the learner from theoretically calculating the memory footprint of a 7B model using Adam in FP16 to diagnosing actual OOM failures under PyTorch DDP.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-02",
"track": "cloud",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3322",
"title": "H100 Mixed-Precision Performance Considerations",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3326",
"title": "H100 Mixed-Precision Training Instability and Performance Diagnosis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3855",
"title": "Architecting Mixed-Precision Training for a 100B LLM",
"bloom": "evaluate"
}
],
"rationale": "Progresses from basic H100 hardware considerations to diagnosing NaNs and finally architecting a 100B model pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-03",
"track": "cloud",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3329",
"title": "Optimizing Large Model Training with Mixed-Precision on Google TPU v5e",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3327",
"title": "Optimizing LLM Training Throughput and Memory with Mixed Precision on TPU v5e",
"bloom": "analyze"
}
],
"rationale": "Moves from diagnosing an OOM on TPU v5e to designing a strategy that fits a large model into its constrained 16GB HBM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-04",
"track": "cloud",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-3324",
"title": "Optimizing Large Language Model Training with Mixed Precision on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3330",
"title": "Optimizing LLM Training with Mixed-Precision on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Scales from general AMD MI300X mixed precision strategies to an extensive 175B model implementation design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-05",
"track": "cloud",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3325",
"title": "Optimizing Large Model Training with Mixed Precision on NVIDIA A100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3328",
"title": "Optimizing Large Language Model Training with Mixed Precision on NVIDIA A100",
"bloom": "analyze"
}
],
"rationale": "Advances from calculating basic FP16 memory requirements on A100 to evaluating FP16 versus BF16 formats for large-scale stability.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-06",
"track": "cloud",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3875",
"title": "LLM Training FLOPs and A100 Time Estimation",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3260",
"title": "H100 Training Cost for Large Language Model",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-4540",
"title": "H100 Budget Feasibility",
"bloom": "create"
}
],
"rationale": "Progresses from a basic FLOP/time estimation on a 1B model to calculating costs for a 100B model, and finally deriving feasibility given strict budget constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-07",
"track": "cloud",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-3832",
"title": "LLM Inference Utilization at Batch Size 1",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3985",
"title": "Training Budget Overrun From Sequence Length",
"bloom": "analyze"
}
],
"rationale": "Explores how compute utilization behaves at batch size 1 and extends to diagnosing severe budget overruns caused by quadratic sequence length scaling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-08",
"track": "cloud",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3259",
"title": "LLM Training Cost Estimation on Google TPU v5e",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3261",
"title": "Optimizing Cost-Performance on Google TPU v5e for LLM Inference",
"bloom": "analyze"
}
],
"rationale": "Contrasts TPU v5e cost estimation for training at 40% efficiency with the nuanced cost-performance tradeoffs of real-time P99 inference.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-09",
"track": "cloud",
"topic": "neural-architecture-search",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2756",
"title": "NAS Search Space and Cost",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2757",
"title": "NAS Compute Budget",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2758",
"title": "NAS vs Manual Design",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner from understanding NAS concepts to calculating its compute budget, and finally making high-level architectural tradeoffs against manual design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-10",
"track": "cloud",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-1638",
"title": "Analyzing Edge NPU Graph Compiler Fallbacks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1641",
"title": "Cloud-to-Edge NPU Offloading Architecture",
"bloom": "evaluate"
}
],
"rationale": "Progresses from diagnosing edge NPU latency spikes due to compiler fallbacks to designing a full cloud-to-edge NPU offloading architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-11",
"track": "cloud",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-2413",
"title": "TPU Systolic Array Underutilization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2299",
"title": "TPU MXU Padding and Memory Stalls",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-2333",
"title": "Systolic Array Tile Padding Collapse",
"bloom": "analyze"
}
],
"rationale": "Advances from diagnosing TPU systolic array underutilization to evaluating MXU padding stalls, ending with a complex dataflow tile collapse diagnosis.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-12",
"track": "cloud",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1649",
"title": "Direct S3 Streaming Bandwidth for A100 Clusters",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1653",
"title": "Diagnosing Object Storage Prefix Rate Limits in Streaming",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1651",
"title": "Object Storage Streaming vs POSIX Systems",
"bloom": "evaluate"
}
],
"rationale": "Moves from calculating theoretical direct S3 streaming bandwidth to diagnosing prefix rate limits, culminating in architecting object storage vs POSIX for a 5PB dataset.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-13",
"track": "cloud",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1842",
"title": "LLM Data Ingestion with Sequential Streaming Formats",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1844",
"title": "Diagnosing Object Store API Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1843",
"title": "Evaluating Streaming Formats for VLM Training",
"bloom": "evaluate"
}
],
"rationale": "Advances from comparing sequential formats against individual JSONs to diagnosing API bottlenecks, and finally evaluating streaming formats for VLM training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-14",
"track": "cloud",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3547",
"title": "Real-time Anomaly Detection on High-Frequency Sensor Streams with NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3549",
"title": "Optimizing Real-time Inference on NVIDIA A100 for Streaming Data",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3550",
"title": "Real-Time Anomaly Detection on High-Throughput Sensor Data with Accelerators",
"bloom": "analyze"
}
],
"rationale": "Takes the learner from architecting an H100 streaming pipeline to fixing real-time latency spikes, ending with designing a resilient 100M event/s pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-15",
"track": "cloud",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-1975",
"title": "Diagnosing Single-Threaded Consumer Lag in Kafka",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1973",
"title": "Streaming Architecture for Strict Ad-Bidding SLAs",
"bloom": "evaluate"
}
],
"rationale": "Starts with diagnosing single-threaded consumer lag in Kafka and moves to selecting stateful streaming frameworks for strict ad-bidding SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-16",
"track": "cloud",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3839",
"title": "Throughput Collapse in Fallback Model Degradation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3852",
"title": "Multi-Tier LLM Fallback and Load Shedding",
"bloom": "evaluate"
}
],
"rationale": "Explores the throughput collapse of fallback models and culminates in designing a sophisticated multi-tier routing architecture for a 10,000 QPS spike.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-17",
"track": "cloud",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3509",
"title": "Google TPU v5e Inference Degradation Strategy",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3511",
"title": "Graceful Degradation Anomaly on TPU v5e",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3515",
"title": "Graceful Degradation for Real-time Anomaly Detection",
"bloom": "analyze"
}
],
"rationale": "Builds from defining TPU v5e inference degradation to diagnosing silent degradation anomalies, finishing with creating a dynamic fallback strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-18",
"track": "cloud",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3510",
"title": "Graceful Degradation for LLM Inference on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3512",
"title": "Graceful Degradation for Large Language Models on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Transitions from creating an MI300X degradation ladder for fraud detection to implementing fail-operational modes under extreme stress.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-19",
"track": "cloud",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1309",
"title": "SLA Budgets Under Network Stress",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-1314",
"title": "Degrading DLRM Ranking Under Capacity Loss",
"bloom": "evaluate"
}
],
"rationale": "Advances from managing SLA budgets during general network stress to gracefully degrading heavy DLRM models after losing 60% capacity.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-20",
"track": "cloud",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-0956",
"title": "Certified Radius Calculation for Biometric API",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-0964",
"title": "Debugging Randomized Smoothing Radius Collapse",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0955",
"title": "Scaling Randomized Smoothing Certification",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating a certified radius to debugging its unexpected collapse, and ultimately scaling the defense to handle 10,000 inferences.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-21",
"track": "cloud",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2774",
"title": "How Adversarial Evasion Works",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2775",
"title": "Adversarial Training Compute Overhead: Adversarial Robustness & Security",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2776",
"title": "Adversarial Defense Selection",
"bloom": "evaluate"
}
],
"rationale": "Starts with understanding evasion mechanics, moves to calculating adversarial training compute overhead, and culminates in selecting strategic defenses.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-22",
"track": "cloud",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L2",
"L4"
],
"questions": [
{
"level": "L2",
"id": "cloud-3517",
"title": "Identifying Model Extraction Risks on NVIDIA A100 Deployments",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3519",
"title": "Diagnosing Covert Adversarial Perturbations on LLMs within an NVIDIA A100 Cloud Fleet",
"bloom": "analyze"
}
],
"rationale": "Advances from identifying basic model extraction risks on A100s to diagnosing covert LLM adversarial perturbations from telemetry.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-23",
"track": "cloud",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3521",
"title": "TPU v5e Adversarial Defense Throughput Impact for Real-time Fraud Detection",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3518",
"title": "Designing a Robust and Reliable ML System against Adversarial Attacks on Google TPU v5e",
"bloom": "analyze"
}
],
"rationale": "Moves from evaluating adversarial defense throughput overhead on TPU v5e to architecting a completely robust real-time ML system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-24",
"track": "cloud",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-0856",
"title": "Diagnosing Moderation Evasion Attacks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-0860",
"title": "Evaluating Defenses for High-Throughput Content Moderation",
"bloom": "evaluate"
}
],
"rationale": "Transitions from diagnosing evasion attacks in content moderation to evaluating cost-effective defenses under strict SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-25",
"track": "cloud",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-1476",
"title": "Quantifying Label Quality Drift in Moderation Pipelines",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1481",
"title": "Diagnosing Annotation Degradation in Data Pipelines",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1478",
"title": "Mitigating Annotation Degradation in Moderation Pipelines",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner from quantifying the cost of label drift to systematically diagnosing it, and finally implementing a strategy to mitigate annotation degradation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-26",
"track": "cloud",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2600",
"title": "Cloud Data Quality As Code L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2601",
"title": "Cloud Data Quality As Code L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2602",
"title": "Cloud Data Quality As Code L5 0",
"bloom": "evaluate"
}
],
"rationale": "Progresses from understanding how data quality checks catch corruption to estimating their compute overhead, and making high-level policy decisions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-27",
"track": "cloud",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-1823",
"title": "Diagnosing Ingestion Bottlenecks in Synchronous Schema Validation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1825",
"title": "Evaluating Ingestion Schema Validation for High-Throughput Streams",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2306",
"title": "The Silent String-to-Hash Collision",
"bloom": "analyze"
}
],
"rationale": "Advances from diagnosing API bottlenecks caused by synchronous validation to splitting validation architecture, and concludes with debugging a silent hash collision.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-28",
"track": "cloud",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3536",
"title": "Designing a High-Throughput Data Quality Gate for AMD MI300X",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3539",
"title": "Ensuring Data Integrity for Petabyte-Scale LLM Training on AMD MI300X Clusters",
"bloom": "analyze"
}
],
"rationale": "Scales from designing a high-throughput MI300X data quality gate to architecting a petabyte-scale data integrity framework.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-29",
"track": "cloud",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3884",
"title": "Calculate Logit Memory in LLM Knowledge Distillation",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1465",
"title": "Debugging Online KD Bottlenecks",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3853",
"title": "Architecting an Online Distillation Pipeline for a 70B Model",
"bloom": "evaluate"
}
],
"rationale": "Takes the user from calculating teacher logit memory overhead to debugging online bottlenecks, culminating in architecting an optimal 70B distillation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-30",
"track": "cloud",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3380",
"title": "H100 Memory Optimization for Knowledge Distillation Logit Matching",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3383",
"title": "Optimizing Large Language Model Deployment on H100 via Knowledge Distillation",
"bloom": "analyze"
}
],
"rationale": "Advances from memory optimization for logit matching on H100 to full student sizing and distillation strategies for deployment.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-31",
"track": "cloud",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-3382",
"title": "Optimizing Large Language Models with Knowledge Distillation on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3385",
"title": "LLM Distillation for High-Throughput Inference on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Moves from evaluating logit vs feature distillation on MI300X to building a fully hardware-tailored, high-throughput inference service.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-32",
"track": "cloud",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3378",
"title": "Optimizing LLM Deployment via Knowledge Distillation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-1466",
"title": "Evaluating Distillation Trade-offs for Cloud LLM Deployment",
"bloom": "evaluate"
}
],
"rationale": "Transitions from optimizing LLM distillation deployment parameters to evaluating broad cloud deployment trade-offs between distillation variants.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-33",
"track": "cloud",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3878",
"title": "VRAM Calculation for FP16 Guardrail Model",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3867",
"title": "Toxicity Classifier Bandwidth Saturation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3859",
"title": "Real-time LLM Safety Guardrail Architecture",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating basic VRAM for a guardrail to diagnosing its impact on bandwidth saturation, and finally architecting a low-latency safety pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-34",
"track": "cloud",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-3585",
"title": "Environmental Impact Disclosure for LLM on NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3588",
"title": "Diagnosing Emergent Bias in LLM on NVIDIA H100 Cluster",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3591",
"title": "Ethical LLM Deployment on H100s: Performance vs. Responsible AI",
"bloom": "analyze"
}
],
"rationale": "Builds from generating environmental model cards to diagnosing emergent bias using telemetry, and ultimately deploying an ethical framework at scale.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-35",
"track": "cloud",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3586",
"title": "TPU v5e Deployment and Responsible AI Guardrails",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3589",
"title": "Designing a Responsible AI Governance Framework for High-Volume Credit Scoring on Google TPU v5e",
"bloom": "analyze"
}
],
"rationale": "Moves from identifying hardware-induced demographic bias on TPU v5e to designing a continuous governance and impact assessment service.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-36",
"track": "cloud",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3590",
"title": "Responsible LLM Deployment on AMD MI300X with Bias Mitigation",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3592",
"title": "Mitigating LLM Bias on AMD MI300X: A Responsible AI Framework",
"bloom": "analyze"
}
],
"rationale": "Advances from implementing basic guardrails under launch pressure to creating a robust governance framework for a highly biased 175B model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-37",
"track": "cloud",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-4350",
"title": "Red-Teaming Throughput vs Coverage Tradeoff",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-4351",
"title": "RLHF Reward Hacking and Constitutional AI Safeguards",
"bloom": "evaluate"
}
],
"rationale": "Explores the trade-offs of allocating red-teaming budgets before tackling complex RLHF reward hacking across the training stack.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-38",
"track": "cloud",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3558",
"title": "Optimizing Large-Scale Foundation Model Training on TPU v5e: Data Efficiency & Compute Constraints",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3561",
"title": "Optimizing Large-Scale Foundation Model Training with Data Efficiency on Google TPU v5e",
"bloom": "analyze"
}
],
"rationale": "Transitions from solving naive TPU v5e data bottlenecks to selecting the most data-efficient strategy for petabyte-scale training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-39",
"track": "cloud",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "cloud-3562",
"title": "Optimizing Data Pruning for Large Language Model Training on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3559",
"title": "Data-Efficient LLM Training Design on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Advances from identifying data pruning heuristics on MI300X to deploying a strategy that specifically avoids model collapse.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-40",
"track": "cloud",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3560",
"title": "Diagnosing Data Efficiency and Model Collapse on NVIDIA H100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3564",
"title": "Optimizing LLM Training Data Efficiency on NVIDIA H100",
"bloom": "analyze"
}
],
"rationale": "Moves from diagnosing high GPU utilization with poor ICR on H100s to designing pipelines that fundamentally fix the I/O bottleneck.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-41",
"track": "cloud",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3863",
"title": "Synthetic Data PCIe Bottleneck in 3D Imaging",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3563",
"title": "Optimizing LLM Training on A100s: Coreset vs. Synthetic Data for the Data Wall Problem",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3891",
"title": "Coreset Selection Pipeline for LLM Pre-training at Scale",
"bloom": "create"
}
],
"rationale": "Builds from diagnosing PCIe bottlenecks in synthetic data generation to evaluating synthetic vs coreset approaches, concluding with building a coreset pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-42",
"track": "cloud",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3879",
"title": "Rack Cooling Limits for 8-GPU H100 vs A100 Nodes",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3850",
"title": "H100 Throughput Drop at Constant Power Limit",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3720",
"title": "Thermal Throttling During Long-Context Training on H100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3861",
"title": "Rack-Level Thermal Architecture for Dense H100 Clusters",
"bloom": "evaluate"
}
],
"rationale": "Takes the user from basic rack cooling limits to diagnosing throughput drops, analyzing long-context throttling impact, and architecting dense cluster scheduling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-43",
"track": "cloud",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-2506",
"title": "Liquid Cooling Retrofit ROI",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-2399",
"title": "Datacenter Liquid Cooling TCO",
"bloom": "evaluate"
}
],
"rationale": "Advances from calculating retrofit ROI for a 5MW facility to choosing cooling mechanisms for a massive 20MW AI datacenter.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-44",
"track": "cloud",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L2",
"L4"
],
"questions": [
{
"level": "L2",
"id": "cloud-3334",
"title": "Google TPU v5e Thermal Limits and Sustained Performance Recall",
"bloom": "remember"
},
{
"level": "L4",
"id": "cloud-3337",
"title": "Diagnosing Performance Variability on Google TPU v5e due to Thermal Constraints",
"bloom": "analyze"
}
],
"rationale": "Moves from understanding TPU v5e burst thermal limits to explicitly diagnosing ambient temperature throttling effects on long jobs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-45",
"track": "cloud",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-3335",
"title": "AMD MI300X Thermal Throttling Analysis for LLM Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3338",
"title": "Cloud LLM Thermal Design with AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Transitions from diagnosing MI300X telemetry during sustained inference drops to architecting a dual-accelerator node cooling strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-015-46",
"track": "cloud",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-3336",
"title": "Designing for Sustained Performance on NVIDIA A100: Thermal Management for Large-Scale AI",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3339",
"title": "NVIDIA A100 Thermal Throttling in Cloud Inference at Scale",
"bloom": "analyze"
}
],
"rationale": "Advances from designing A100 thermal management for inference to orchestrating hyperscale cluster workloads around hot spots.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-016-01",
"track": "cloud",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3250",
"title": "Accelerator Selection for Large Language Model Inference on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3247",
"title": "AMD MI300X Accelerator Selection for Large Language Model Inference",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3252",
"title": "Optimizing Large Language Model Inference on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Guides the learner from evaluating the MI300X for batch-1 inference, to designing the architecture against competitors, and finally optimizing a 100B model under strict HBM limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-016-02",
"track": "cloud",
"topic": "operator-scheduling",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2561",
"title": "Cloud Learning Rate Scheduling L2 0",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-2562",
"title": "Cloud Learning Rate Scheduling L3 0",
"bloom": "apply"
},
{
"level": "L5",
"id": "cloud-2563",
"title": "Cloud Learning Rate Scheduling L5 0",
"bloom": "evaluate"
}
],
"rationale": "Progresses from diagnosing a basic learning rate plateau, to applying scaling rules, to making strategic restart decisions in a massive multi-day training run.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-016-03",
"track": "cloud",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1739",
"title": "Resolving Image Pipeline Preprocessing Bottlenecks",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1740",
"title": "Diagnosing End-to-End Latency in Image Serving",
"bloom": "analyze"
}
],
"rationale": "Examines a specific CPU preprocessing bottleneck (PIL) in image serving, moving from identifying the cause to evaluating mitigation strategies like quantization versus pipeline refactoring.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-016-08",
"track": "cloud",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3500",
"title": "Progressive Rollout of a Large Language Model on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3501",
"title": "Progressive Rollout of a Large Language Model on AMD MI300X Cluster",
"bloom": "analyze"
}
],
"rationale": "A progressive sequence on deploying massive LLMs on AMD MI300X, moving from planning the initial rollout to engineering a highly available strategy with rapid rollback capabilities.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-016-22",
"track": "cloud",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3881",
"title": "KV Cache Footprint: Decoder vs Encoder-Decoder",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3369",
"title": "A100 Inference Optimization: Encoder-Decoder Tradeoffs for Real-time LLM Deployment",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3371",
"title": "A100 Deployment Strategy: Encoder-Decoder Tradeoffs for LLM Inference",
"bloom": "analyze"
}
],
"rationale": "Covers A100 LLM deployment architectures by contrasting baseline KV cache footprints, diagnosing tradeoffs for real-time applications, and analyzing deep system costs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-10",
"track": "cloud",
"topic": "distribution-drift-detection",
"competency_area": "reliability",
"levels": [
"L2",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-2765",
"title": "Why OOD Detection Matters",
"bloom": "understand"
},
{
"level": "L5",
"id": "cloud-2767",
"title": "OOD Detection Strategy Selection",
"bloom": "evaluate"
}
],
"rationale": "Covers out-of-distribution (OOD) detection for vision models, moving from understanding the failure modes of softmax confidence to selecting robust OOD algorithms for safety-critical deployment.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-11",
"track": "cloud",
"topic": "distribution-drift-detection",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3505",
"title": "Real-time ML Output Drift Detection on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3504",
"title": "TPU-Powered Recommendation System: Diagnosing Data & Concept Drift",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3508",
"title": "Real-time Recommendation System Drift on AMD MI300X",
"bloom": "analyze"
}
],
"rationale": "Focuses on drift in large-scale recommendation systems, progressing from detecting embedding drift to diagnosing complex training-serving skew, and finally architecting full mitigation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-12",
"track": "cloud",
"topic": "distribution-drift-detection",
"competency_area": "reliability",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-3506",
"title": "Real-time Data Drift Detection for Transformer Models on NVIDIA A100 Architectures",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3507",
"title": "Real-time LLM Data Drift Detection on NVIDIA H100",
"bloom": "analyze"
}
],
"rationale": "Explores real-time drift detection for generative models, progressing from comparing statistical metrics across serving architectures to fully integrating input-drift detection into a live LLM pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-15",
"track": "cloud",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1814",
"title": "Runtime Entropy Monitoring for Adversarial Shifts",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1815",
"title": "Diagnosing Runaway Generation from Prompt Injections",
"bloom": "analyze"
}
],
"rationale": "Teaches how to monitor for and diagnose adversarial attacks using runtime output metrics, moving from calculating entropy thresholds to root-causing latency spikes caused by prompt injections.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-16",
"track": "cloud",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "cloud-3525",
"title": "Diagnosing Stragglers in Real-time LLM Inference on AMD MI300X Cluster",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3523",
"title": "Real-time Observability for H100-powered ML Inference Service",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3526",
"title": "A100 Inference Performance Degradation: Monitoring & Anomaly Detection",
"bloom": "analyze"
}
],
"rationale": "Progresses through identifying and managing inference stragglers, from diagnosing p99 spikes on specific hardware to designing comprehensive, cluster-wide observability and MTTR strategies for high-performance GPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-17",
"track": "cloud",
"topic": "dma-data-movement",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "cloud-3877",
"title": "PCIe Gen5 Transfer Time and Pinned Memory",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-3864",
"title": "Diagnosing Low Host-to-Device Bandwidth",
"bloom": "analyze"
},
{
"level": "L4",
"id": "cloud-3282",
"title": "TPU v5e Data Transfer Bottleneck Analysis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3284",
"title": "Optimizing Data Movement for LLM Inference on Google TPU v5e",
"bloom": "analyze"
}
],
"rationale": "Focuses on host-to-device PCIe/DMA bottlenecks, moving from basic transfer time calculations to diagnosing real-world stalls, and finally designing complex data movement strategies for large models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-18",
"track": "cloud",
"topic": "dma-data-movement",
"competency_area": "memory",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "cloud-3283",
"title": "Optimizing Large Embedding Data Movement on AMD MI300X",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-3286",
"title": "Optimizing Data Movement for Large Models on NVIDIA H100",
"bloom": "analyze"
}
],
"rationale": "Examines memory offloading strategies, progressing from managing dynamic KV caches and embeddings across PCIe to architecting full host-device memory movement for 100B parameter fine-tuning.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-19",
"track": "cloud",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-3765",
"title": "mmap for Zero-Copy Model Loading",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-3767",
"title": "mmap vs Safetensors: Cold Start Optimization for Model Serving",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-3769",
"title": "Shared mmap for Multi-Tenant GPU Inference",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "cloud-3770",
"title": "mmap and NUMA: The Hidden Latency Trap",
"bloom": "evaluate"
}
],
"rationale": "A comprehensive journey through memory-mapped file loading, from basic cold-start tradeoffs to complex multi-process sharing, culminating in deep NUMA-aware latency optimization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-20",
"track": "cloud",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "cloud-1646",
"title": "Optimizing Dual-Socket Memory Bandwidth",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-1645",
"title": "Diagnosing High Tail Latency in Dual-Socket CPU Inference",
"bloom": "analyze"
}
],
"rationale": "Explores the impact of NUMA boundaries on CPU inference, progressing from measuring memory bandwidth across sockets to root-causing severe p99 tail latency spikes under high contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-36",
"track": "cloud",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "cloud-4541",
"title": "MI300X DVFS Latency Penalty",
"bloom": "understand"
},
{
"level": "L3",
"id": "cloud-4554",
"title": "GPU Deep Sleep Energy Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "cloud-4530",
"title": "H100 Burst Duty Cycling",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "cloud-4514",
"title": "Diurnal Workload Power Scaling",
"bloom": "create"
}
],
"rationale": "Builds expertise in GPU power management, progressing from understanding DVFS latency penalties and static deep-sleep savings to evaluating state-transition overheads, and finally architecting a cluster-wide diurnal power strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-37",
"track": "cloud",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "cloud-4550",
"title": "Cloud GPU Cluster Auto-Scaling for Power Saving",
"bloom": "apply"
},
{
"level": "L6+",
"id": "cloud-4505",
"title": "Dynamic MIG Autoscaling on A100",
"bloom": "create"
}
],
"rationale": "Explores infrastructure-level power mitigation, moving from basic orchestrator scaling of idle nodes to leveraging Multi-Instance GPU (MIG) for dynamic, sub-GPU duty cycling under burst traffic.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "cloud-chain-auto-secondary-017-39",
"track": "cloud",
"topic": "tensor-arena-planning",
"competency_area": "memory",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "cloud-2016",
"title": "Transposed Tensor Bandwidth Collapse",
"bloom": "analyze"
},
{
"level": "L5",
"id": "cloud-2020",
"title": "Evaluating NCHW vs NHWC Layouts",
"bloom": "evaluate"
}
],
"rationale": "Examines the severe performance penalties of data layout mismatches, moving from diagnosing bandwidth collapse on transposed tensors to evaluating the systemic costs of implicit layout conversions across a full vision model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-001-04",
"track": "edge",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1790",
"title": "Google Coral Edge TPU Precision Requirement",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-1883",
"title": "Optimizing Object Detection Latency on Google Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1701",
"title": "Google Coral USB Latency Pipeline Optimization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1886",
"title": "Edge TPU Latency Decomposition for Real-time Object Detection",
"bloom": "analyze"
}
],
"rationale": "Optimizing an object detection pipeline specifically for the constraints and latency targets of a Google Coral Edge TPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-001-05",
"track": "edge",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1070",
"title": "Latency Decomposition: Size E2E Latency for Smart Camera on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1069",
"title": "Latency Decomposition: Diagnose and Fix Jetson Orin Inference Latency Spike",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-1068",
"title": "Latency Decomposition: Full Pipeline Latency Audit for Autonomous Drone on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Decomposing, diagnosing, and auditing the end-to-end latency of a vision pipeline deployed on a Jetson Orin.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-001-06",
"track": "edge",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-0528",
"title": "The Watchdog Timeout Freeze",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-0602",
"title": "The Watchdog Priority Inversion",
"bloom": "create"
}
],
"rationale": "Debugging and resolving system-level priority inversion and watchdog timeouts in RTOS edge deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-001-10",
"track": "edge",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-0731",
"title": "The OTA Update Tax",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1154",
"title": "Edge TCO Diagnosis: Connectivity Cost Surprise in IoT Fleet",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1169",
"title": "Edge TCO Optimization: Optimize Connectivity Costs Dominating TCO",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the hidden cost of cellular data and OTA updates on the total cost of ownership for IoT edge fleets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-001-11",
"track": "edge",
"topic": "tco-cost-modeling",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1175",
"title": "Edge TCO Recall: CapEx vs OpEx Tradeoff for Edge AI",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1163",
"title": "Edge TCO Implement: Hailo-8 vs Cloud Break-Even Analysis",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1155",
"title": "Edge TCO Evaluation: Coral TPU vs Cloud Inference for Low-Volume Deployments",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-0033",
"title": "Edge-Cloud Hybrid Inference Break-Even",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1164",
"title": "Edge TCO Mastery Multi-Tier Edge-Cloud-Hybrid Cost Optimization",
"bloom": "create"
}
],
"rationale": "Progresses from basic CapEx/OpEx tradeoffs to complex edge-cloud hybrid break-even analyses.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-001-12",
"track": "edge",
"topic": "tco-cost-modeling",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1158",
"title": "Edge TCO Fluency: Quick TCO Estimation for Edge Deployment",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1146",
"title": "Edge TCO Analyze: Total Cost of Deployment at Scale",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1177",
"title": "Edge TCO Specification: Design Lifecycle Cost Model for 5-Year Edge Fleet",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-1165",
"title": "Edge TCO Mastery: Full Lifecycle Cost Model for Industrial Edge AI",
"bloom": "create"
}
],
"rationale": "Builds a comprehensive multi-year lifecycle cost model for large-scale industrial edge AI deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-002-07",
"track": "edge",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L1",
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1401",
"title": "Edge TPU Precision Requirements for Efficient CNNs",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-1932",
"title": "MobileNet Architectural Benefits on Coral Edge TPU",
"bloom": "understand"
},
{
"level": "L4",
"id": "edge-1508",
"title": "Diagnosing CPU Fallback in EfficientNet on Edge TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1935",
"title": "Designing an Efficient MobileNet for Google Coral Edge TPU",
"bloom": "analyze"
}
],
"rationale": "Examines the deployment of efficient CNNs on the Coral Edge TPU, progressing from basic INT8 requirements to diagnosing CPU fallbacks and designing a custom model for strict power constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-002-08",
"track": "edge",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1009",
"title": "Recall MobileNet depthwise separable FLOP savings on Jetson Orin",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1560",
"title": "Depthwise Separable Convolution on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1014",
"title": "Optimize MobileNetV2 latency on Jetson Orin via layer pruning and depthwise op scheduling",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-1495",
"title": "Architecting a Multi-Camera Perception System for Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1643",
"title": "Heterogeneous CNN Design for Jetson Orin",
"bloom": "create"
}
],
"rationale": "Focuses on optimizing depthwise separable convolutions and multi-stream CNN architectures specifically for the memory and compute hierarchy of the Jetson Orin.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-002-09",
"track": "edge",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1029",
"title": "Recalling Jetson Orin Memory Architecture for Multi-Model Compound Systems",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1603",
"title": "Agent Orchestration Memory Footprint on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1019",
"title": "Analyzing RAG Feasibility on Jetson Orin for On-Device Retrieval",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1526",
"title": "Optimizing RAG Pipeline Latency on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Walks through the memory architecture and sizing calculations required to successfully deploy and optimize a full RAG pipeline on Jetson Orin.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-002-10",
"track": "edge",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1028",
"title": "Recalling Key Constraints of Coral Edge TPU for Compound Pipelines",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1212",
"title": "Latency Spikes in Cascaded Edge TPU Pipelines",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1688",
"title": "Edge TPU Multi-Model Pipeline Fallback Optimization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2027",
"title": "Real-time Multi-Model Object Analysis on Google Coral Edge TPU",
"bloom": "analyze"
}
],
"rationale": "Explores the challenges of chaining multiple models on the Coral Edge TPU, from understanding its hardware constraints to diagnosing latency spikes and architecting real-time cascading pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-002-11",
"track": "edge",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0323",
"title": "The Thermal Budget Trap",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0208",
"title": "The Sustained TOPS Reality Check",
"bloom": "understand"
},
{
"level": "L4",
"id": "edge-1522",
"title": "Diagnosing Thermal Throttling on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0077",
"title": "The Thermal Staircase",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1669",
"title": "Dynamic Thermal Throttling on Jetson Orin",
"bloom": "create"
}
],
"rationale": "Progresses from understanding top-level TDP ratings and sustained TOPS to diagnosing and dynamically mitigating thermal throttling via P-states on the Jetson Orin.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-002-12",
"track": "edge",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L2",
"L3",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0352",
"title": "Edge Power Efficiency 101",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0152",
"title": "The Passive Cooling Limit",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1443",
"title": "Explain Thermal Throttling on Google Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1916",
"title": "Power Budgeting for Real-time Edge TPU Deployment",
"bloom": "analyze"
}
],
"rationale": "Examines the realities of running continuous inference on low-power accelerators, moving from theoretical efficiency to managing passive cooling limits and strict power budgeting on the Edge TPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-18",
"track": "edge",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1000",
"title": "Fluency: Identifying When Gradient Checkpointing Helps vs. Hurts on Constrained Edge Hardware",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0835",
"title": "On-Device Fine-Tuning Memory Constraints",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0834",
"title": "Debugging OOM in Edge Device Finetuning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1255",
"title": "On-Device Fine-Tuning Checkpoint Architecture for Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1002",
"title": "Mastery: Proving the Memory-Compute Optimality of Gradient Checkpointing Under Edge Constraints",
"bloom": "evaluate"
}
],
"rationale": "A deep dive into gradient checkpointing for on-device fine-tuning, evaluating when it helps, diagnosing OOMs, and proving its compute-memory optimality.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-19",
"track": "edge",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "edge-1600",
"title": "Hailo-8 Activation Spilling Bandwidth Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1006",
"title": "Realizing Tiled Activation Processing for CNNs on Hailo-8 SRAM Constraints",
"bloom": "apply"
}
],
"rationale": "Explores activation memory spilling on Hailo-8 dataflow accelerators, covering the bandwidth calculations and tiled processing solutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-20",
"track": "edge",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1770",
"title": "Google Coral Edge TPU Activation Data Type Requirement",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1519",
"title": "Edge TPU Activation Memory Spilling and Partitioning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1726",
"title": "Edge TPU Activation Memory Constraint Evaluation",
"bloom": "evaluate"
}
],
"rationale": "Examines Google Coral Edge TPU constraints, from basic INT8 requirements to handling activation memory spilling, partitioning, and structural mitigation strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-21",
"track": "edge",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1007",
"title": "Recall: Defining Activation Memory and Its Relationship to Sequence Length",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1863",
"title": "Jetson Orin LLM Deployment: Activation Memory Bottleneck & Checkpointing",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1861",
"title": "Optimizing Large Language Model Deployment on NVIDIA Jetson Orin with Activation Memory Constraints",
"bloom": "analyze"
}
],
"rationale": "Focuses on LLM KV-cache and activation memory scaling on Jetson Orin, diagnosing forward-pass bottlenecks, and applying quantization and recomputation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-22",
"track": "edge",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-2078",
"title": "Active Learning for Edge Model Adaptation on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1033",
"title": "Dataset Curation: Evaluate Active Learning Strategies for Edge-Deployed Models",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-1735",
"title": "On-Device Active Learning Data Selection",
"bloom": "evaluate"
}
],
"rationale": "A progression on implementing active learning for Jetson Orin deployments with limited uplinks, evaluating selection strategies, and sizing the on-device uncertainty pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-23",
"track": "edge",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-2082",
"title": "Coral Edge TPU Disease Detection: Active Learning for Rare Disease Data Curation",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-1260",
"title": "Active Learning Data Pipeline for Edge TPU Defect Detection",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1647",
"title": "Quantization-Aware Active Learning Pipeline for Edge TPU",
"bloom": "create"
}
],
"rationale": "Focuses on using the Coral Edge TPU for active data curation of rare events, handling its INT8 constraints, and designing a quantization-aware annotation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-25",
"track": "edge",
"topic": "dataset-curation",
"competency_area": "data",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2081",
"title": "Optimal Labeling Strategy for Edge Deployment on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2083",
"title": "Optimizing Edge AI Dataset Curation for Qualcomm Cloud AI 100 Deployment",
"bloom": "analyze"
}
],
"rationale": "Compares traditional batch annotation versus active learning economics for Cloud AI 100 defect detection systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-26",
"track": "edge",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1780",
"title": "Hailo-8 Local Memory Architecture Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1939",
"title": "Hailo-8 Deployment: Model Memory Footprint & Throughput",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1288",
"title": "Hailo-8 Host Bandwidth Exceeded Error Diagnosis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1753",
"title": "Sizing Object Detection for Hailo-8 Dataflow Limits",
"bloom": "evaluate"
}
],
"rationale": "Walks through sizing vision models on Hailo-8, starting from recalling its local memory absence, to computing footprints, diagnosing host bandwidth crashes, and evaluating multi-camera feasibility.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-27",
"track": "edge",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1388",
"title": "LLaMA 7B INT8 Memory Footprint on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1095",
"title": "Model Size Estimation: Specify Memory Budget for Edge LLM with Fixed 16GB",
"bloom": "create"
},
{
"level": "L5",
"id": "edge-1089",
"title": "Model Size Estimation: Master Full Memory Audit for Edge LLM Deployment on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Focuses on deploying large language models on Jetson Orin, estimating the 7B INT8 footprint, specifying budgets for smaller memory constraints, and conducting full KV-cache audits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-29",
"track": "edge",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1438",
"title": "Edge TPU Quantization and Bottleneck Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1707",
"title": "Edge TPU Model Quantization and Footprint Optimization",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1940",
"title": "Coral Edge TPU Feasibility for MobileNetV3-Small: Memory and Performance Estimation",
"bloom": "analyze"
}
],
"rationale": "Examines Coral Edge TPU memory limits, determining why FP32 models are rejected, optimizing INT8 footprints, and estimating throughput feasibility.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-003-30",
"track": "edge",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1591",
"title": "Memory Footprint Estimation for Qualcomm Cloud AI 100",
"bloom": "apply"
},
{
"level": "L6+",
"id": "edge-1665",
"title": "LLM Inference Sizing for Cloud AI 100",
"bloom": "create"
}
],
"rationale": "Focuses on LLM memory sizing for the Qualcomm Cloud AI 100, calculating KV cache budgets, and architecting multi-user concurrency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-005-09",
"track": "edge",
"topic": "batching-strategies",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-0215",
"title": "The Hidden Cost of Continuous Batching",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1211",
"title": "Latency Spikes in Dynamic Batching on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1686",
"title": "Dynamic Batching Latency Optimization on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1891",
"title": "Optimizing Real-time Inference on NVIDIA Jetson Orin with Adaptive Batching",
"bloom": "analyze"
}
],
"rationale": "Starts with understanding the timeout costs of batching, moves to diagnosing tail latency spikes on real hardware, optimizing the policy for a vehicle, and finally designing an adaptive scheduling policy to meet strict SLA constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-005-10",
"track": "edge",
"topic": "batching-strategies",
"competency_area": "latency",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1338",
"title": "Hailo-8 Dataflow Batching Latency",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1461",
"title": "Hailo-8 Multi-Stream Dataflow Batching Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1642",
"title": "Dataflow-Aware Dynamic Batching for Multi-Camera Hailo-8 Edge Appliances",
"bloom": "create"
}
],
"rationale": "Progresses from theoretical static batch latency on a DRAM-less accelerator, to designing a multi-stream architecture, and concludes with dynamically scheduling asynchronous streams to minimize PCIe contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-005-11",
"track": "edge",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0160",
"title": "The Edge Efficiency Metric",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1346",
"title": "Hailo-8 INT8 Energy Per Operation Math",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1469",
"title": "Architecting Energy-Efficient Streams for Hailo-8",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1651",
"title": "Dataflow Optimization for Hailo-8 Stream Processing",
"bloom": "create"
}
],
"rationale": "Moves from identifying efficiency metrics to calculating theoretical energy costs, architecting streams to minimize host-memory access, and fully optimizing a dataflow pipeline for zero-copy energy efficiency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-005-12",
"track": "edge",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1421",
"title": "Energy Cost Analysis of Memory vs Compute on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1805",
"title": "Energy-Aware Inference Architecture on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1929",
"title": "Energy-Efficient LLM Deployment on NVIDIA Jetson Orin",
"bloom": "analyze"
}
],
"rationale": "Progresses from analyzing the energy discrepancy between dense compute and memory-bound layers to designing an inference architecture that balances these costs, and ultimately profiling an LLM deployment bounded by KV-cache accesses.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-01",
"track": "edge",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "edge-1222",
"title": "GPTQ 3-bit Latency Degradation on Ampere",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1719",
"title": "Mitigating Bandwidth Bottlenecks for 4-bit AWQ on Jetson Orin",
"bloom": "analyze"
}
],
"rationale": "Explores latency degradation issues when applying extreme quantization (GPTQ/AWQ) to models on Jetson Orin and how to mitigate them.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-02",
"track": "edge",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1595",
"title": "Hailo-8 4-Bit Streaming Bandwidth",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1265",
"title": "Architecting Extreme Quantization for Hailo-8 Streams",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1652",
"title": "Sub-INT8 Quantization on Dataflow Accelerators",
"bloom": "create"
}
],
"rationale": "Progresses from calculating streaming bandwidth for sub-4-bit models on Hailo-8 to architecting the storage mechanism and designing a mixed quantization scheme.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-03",
"track": "edge",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1911",
"title": "Deploying Sub-4-bit LLMs on Qualcomm Cloud AI 100: Balancing Precision and Performance",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1740",
"title": "Sizing a 70B LLM for Qualcomm Cloud AI 100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1913",
"title": "Extreme Quantization Deployment on Qualcomm Cloud AI 100",
"bloom": "analyze"
}
],
"rationale": "Guides the learner through evaluating, sizing, and finally deploying extreme sub-4-bit quantization on Qualcomm Cloud AI 100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-04",
"track": "edge",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1376",
"title": "4-Bit Weight Packing for Coral Edge TPU Bandwidth",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1551",
"title": "Evaluating 4-bit AWQ versus INT8 on Coral Edge TPU",
"bloom": "evaluate"
}
],
"rationale": "Evaluates the theoretical bandwidth gains of 4-bit weight packing on the Coral Edge TPU against the practical execution tradeoffs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-05",
"track": "edge",
"topic": "extreme-quantization",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-0320",
"title": "The Federated Learning Bandwidth Diet",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-0837",
"title": "Federated Gradient Uplink Calculation",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-0838",
"title": "Evaluating Extreme Gradient Quantization",
"bloom": "evaluate"
}
],
"rationale": "Calculates the baseline bandwidth for federated learning, determines gradient quantization constraints, and evaluates the sufficiency of extreme quantization methods.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-06",
"track": "edge",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L1",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1776",
"title": "Identifying Jetson Orin Accelerator Routing",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1992",
"title": "Edge Inference Traffic Management on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1746",
"title": "Heterogeneous Routing on NVIDIA Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1999",
"title": "Edge AI Load Balancing & Routing for NVIDIA Jetson Orin Deployments",
"bloom": "analyze"
}
],
"rationale": "Advances from identifying basic routing features on Jetson Orin to managing fleet traffic, heterogeneous on-device routing, and comprehensive load-balancing design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-07",
"track": "edge",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1350",
"title": "Load Balancing Requests Across Google Coral TPUs",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1478",
"title": "Multi-TPU Routing for High-Frequency Industrial Inspection",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1658",
"title": "Edge TPU Cluster Load Balancing for Real-Time Video Analytics",
"bloom": "create"
}
],
"rationale": "Teaches the fundamentals of sizing compute for Coral TPUs, scaling to multi-TPU routing for streams, and handling bursts in real-time edge cluster environments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-08",
"track": "edge",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1991",
"title": "Load Balancing for Dynamic Qualcomm Cloud AI 100 Inference Workloads",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1382",
"title": "Weighted Round-Robin for Asymmetric Edge Accelerators",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1998",
"title": "Optimizing Edge Inference Routing on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1325",
"title": "Evaluating Request Routing on Qualcomm Cloud AI 100",
"bloom": "evaluate"
}
],
"rationale": "Covers load-balancing algorithms for dynamic inference workloads, applies weighted round-robin on asymmetric edge accelerators, optimizes routing to fix latency, and evaluates stateful inference routing.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-09",
"track": "edge",
"topic": "load-balancing",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1431",
"title": "Hailo-8 Host Memory Bandwidth Imbalance",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1811",
"title": "Hailo-8 Multi-Camera Stream Load Balancing",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1993",
"title": "Edge Inference Load Balancing with Hailo-8 Accelerators",
"bloom": "analyze"
}
],
"rationale": "Diagnoses Hailo-8 memory bandwidth imbalances under round-robin, scales to multi-camera stream routing, and designs complex load balancing for varied traffic spikes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-10",
"track": "edge",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1237",
"title": "TensorRT DLA Fallback Overhead on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1706",
"title": "TensorRT DLA to GPU Fallback Optimization",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2017",
"title": "Optimizing Vision Model Deployment on Jetson Orin: TensorRT Conversion Challenges",
"bloom": "analyze"
}
],
"rationale": "Explores the latency cost of partial DLA delegation on Jetson Orin, diagnosing operator coverage gaps, and ultimately optimizing ONNX-to-TensorRT conversion to hit strict SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-11",
"track": "edge",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1590",
"title": "Hailo-8 ONNX Conversion and Graph Break Bandwidth",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2013",
"title": "Optimizing Object Detection on Hailo-8 with Operator Gaps",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1274",
"title": "Architecting an INT8 Dataflow Pipeline for Hailo-8",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1663",
"title": "Hailo-8 Dataflow Compilation with Unsupported Ops",
"bloom": "create"
}
],
"rationale": "Progresses from calculating the bandwidth cost of CPU fallback on Hailo-8 to diagnosing operator gaps, architecting a dataflow pipeline, and deploying custom unsupported operators.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-12",
"track": "edge",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1387",
"title": "Edge TPU Operator Fallback Latency",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1542",
"title": "Edge TPU Operator Delegation Tradeoffs",
"bloom": "evaluate"
}
],
"rationale": "Teaches the theoretical latency limits of Edge TPU operator fallback and requires the learner to navigate the tradeoffs of modifying the model to avoid delegation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-13",
"track": "edge",
"topic": "model-format-conversion",
"competency_area": "deployment",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1792",
"title": "Qualcomm Cloud AI 100 Toolchain and Specs Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2014",
"title": "Optimizing ONNX Model Deployment on Qualcomm Cloud AI 100 with Operator Coverage Gaps",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1310",
"title": "Diagnosing Operator Fallback on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1751",
"title": "Optimizing ViT Operator Fallback on Cloud AI 100",
"bloom": "evaluate"
}
],
"rationale": "Builds from recalling Cloud AI 100 toolchains to identifying ONNX conversion gaps, diagnosing specific fallback bottlenecks, and optimizing deployment for complex models like ViTs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-14",
"track": "edge",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1917",
"title": "NVIDIA Jetson Orin Thermal Throttling: Sustained Performance Recall",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1394",
"title": "Calculate thermal throttling impact on Orin throughput",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2144",
"title": "Thermal Design for Fanless Jetson Orin in IP67 Enclosure",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1549",
"title": "Sustained Throughput vs Thermal Throttling on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1923",
"title": "Optimizing Sustained Performance on NVIDIA Jetson Orin in Challenging Environments",
"bloom": "analyze"
}
],
"rationale": "Moves from recognizing thermal throttling on Jetson Orin to calculating its impact, designing enclosures, evaluating partitioning strategies, and managing thermal profiles in challenging environments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-15",
"track": "edge",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1789",
"title": "Hailo-8 Power and Performance Specification Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2150",
"title": "Thermal Runaway Prevention for Sustained AI Workload on Hailo-8",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1292",
"title": "Diagnosing Sudden Inference Latency Spikes on Hailo-8 Under Load",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1400",
"title": "Thermal Sizing for Fanless Hailo-8 Edge Camera",
"bloom": "evaluate"
}
],
"rationale": "Starts with Hailo-8 power specifications, explores thermal runaway prevention, diagnoses sudden latency spikes under load, and evaluates thermal feasibility in fanless designs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-16",
"track": "edge",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1454",
"title": "Analyzing Thermal Throttling on Google Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1716",
"title": "Mitigating Edge TPU Thermal Throttling in Sealed Enclosures",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1925",
"title": "Google Coral Edge TPU Thermal Throttling in Edge Deployments",
"bloom": "analyze"
}
],
"rationale": "Analyzes the impact of ambient heat on Coral Edge TPU FPS, mitigates throttling in sealed enclosures, and resolves severe throughput drops in high-heat outdoor settings.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-17",
"track": "edge",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1360",
"title": "Calculate Sustained TOPS Under Edge Thermal Throttling",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1493",
"title": "Architecting Thermal Management for Qualcomm Cloud AI 100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1677",
"title": "Architecting Thermal Resilient Video Analytics",
"bloom": "create"
}
],
"rationale": "Calculates sustained TOPS under thermal limits for Cloud AI 100, architects thermal management leveraging DVFS, and designs a resilient video analytics pipeline that handles SLA constraints under heavy throttling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-26",
"track": "edge",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L1",
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0117",
"title": "Pruning for Parallelism",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0125",
"title": "The Structured Sparsity Speedup",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-1587",
"title": "TensorRT INT8 Graph Optimization on Jetson Orin",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1475",
"title": "Multi-Model Compilation Strategy for NVIDIA Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1656",
"title": "Heterogeneous Graph Compilation on Jetson Orin",
"bloom": "create"
}
],
"rationale": "Covers the full spectrum of Jetson Orin graph compilation, starting with pruning basics and structured speedups, advancing to AOT compilation tradeoffs, and resolving heterogeneous partitioning bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-27",
"track": "edge",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1227",
"title": "Analyzing Fusion Memory Spills on Qualcomm Cloud AI 100",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1699",
"title": "Operator Fusion Bottlenecks on Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1979",
"title": "Optimizing Large Language Model Deployment on Qualcomm Cloud AI 100 via Ahead-of-Time Graph Compilation",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1981",
"title": "Optimizing a Large Language Model for Qualcomm Cloud AI 100 via Graph Compilation",
"bloom": "analyze"
}
],
"rationale": "Explores operator fusion pitfalls on Cloud AI 100, analyzes fusion bottlenecks, designs an AOT strategy for LLMs, and fully optimizes the compilation pipeline across hardware limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-006-28",
"track": "edge",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "edge-1286",
"title": "Diagnosing Edge TPU Compiler Graph Partitioning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1762",
"title": "Optimizing Object Detection Graph for Coral Edge TPU",
"bloom": "evaluate"
}
],
"rationale": "Diagnoses Edge TPU compilation partitioning failures and evaluates the tradeoffs of modifying graphs versus pipelining the fallback.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-05",
"track": "edge",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2096",
"title": "Data Pruning for Edge Deployment on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1563",
"title": "On-Device Coreset Capacity Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2100",
"title": "Diagnosing Data Wall Challenges on Qualcomm Cloud AI 100 for Edge ML",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1258",
"title": "On-Premise Continuous Learning Data Selection",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1646",
"title": "On-Premise Coreset Selection Pipeline",
"bloom": "create"
}
],
"rationale": "Progresses from defining data pruning on the Cloud AI 100 to calculating coreset capacity, diagnosing the data wall, designing continuous learning selection, and finally architecting a concurrent coreset selection pipeline that meets SLA constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-06",
"track": "edge",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1404",
"title": "Hailo-8 Local DRAM Memory Architecture",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2102",
"title": "Optimizing Data for Edge AI on Hailo-8",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1298",
"title": "Coreset Memory Bottleneck on Dataflow Edge",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1732",
"title": "Data Pruning for Hailo-8 Streaming Limits",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2104",
"title": "Edge Data Efficiency for Real-time Object Detection on Hailo-8",
"bloom": "analyze"
}
],
"rationale": "Builds from basic Hailo-8 memory limitations to identifying real-world performance gaps, diagnosing memory bottlenecks of a coreset approach, defining streaming constraints, and designing a full drone selection pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-07",
"track": "edge",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1781",
"title": "Recall NVIDIA Jetson Orin DLA Purpose",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-0492",
"title": "The Edge Transformer Trap",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1942",
"title": "Hardware-aware NAS Deployment on NVIDIA Jetson Orin for Real-time Object Detection",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1764",
"title": "Hardware-Aware NAS for Jetson Orin DLA",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1946",
"title": "Hardware-aware NAS for Real-time Object Detection on NVIDIA Jetson Orin",
"bloom": "analyze"
}
],
"rationale": "Starts with understanding Jetson Orin DLA basics, evaluates standard vs efficient models for it, diagnoses why general NAS fails on Orin, constrains NAS to DLA only, and finally implements the full hardware-aware NAS strategy constrained by Orin's capacities.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-19",
"track": "edge",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2048",
"title": "Hailo-8 Hardware Security Features for Edge Adversarial Robustness",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1558",
"title": "Adversarial Purification on Hailo-8",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2052",
"title": "Diagnosing Intermittent Object Detection Failures on Hailo-8 Due to Adversarial Input",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1256",
"title": "Adversarial Defense Architecture on Hailo-8 Dataflow Accelerator",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2055",
"title": "Mitigating Adversarial Patch Attacks on Hailo-8 Edge Deployments for Autonomous Vehicles",
"bloom": "analyze"
}
],
"rationale": "Starts with built-in hardware security features of Hailo-8, calculates the overhead of adding adversarial purification, diagnoses real-world intermittent failures, designs a streaming defense architecture, and finally specifies a full mitigation pipeline for an autonomous vehicle.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-20",
"track": "edge",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1367",
"title": "Randomized Smoothing on Edge TPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2056",
"title": "Coral Edge TPU Adversarial Defense Performance Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2053",
"title": "Edge AI Reliability: Adversarial Robustness on Google Coral TPU",
"bloom": "analyze"
}
],
"rationale": "Progresses from calculating the maximum randomized smoothing passes a Coral TPU can run to diagnosing latency bottlenecks in the defense, and finally designing a robust anomaly detector adhering to strict Edge TPU constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-21",
"track": "edge",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L1",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1771",
"title": "Identifying Power Side-Channel Attacks on Edge Accelerators",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2054",
"title": "Mitigating Model Extraction on Edge AI with Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1506",
"title": "Diagnosing Power Anomalies from Adversarial Energy Attacks",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2057",
"title": "Model Extraction Attack on Qualcomm Cloud AI 100",
"bloom": "analyze"
}
],
"rationale": "Progresses from defining power side-channel attacks on Cloud AI 100, designing basic extraction mitigations, diagnosing power anomalies caused by energy attacks, to mitigating a full power-side-channel attack using advanced secure-boot workarounds.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-22",
"track": "edge",
"topic": "gpu-compute-architecture",
"competency_area": "compute",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-0173",
"title": "The Edge Compute Ceiling",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2187",
"title": "Tensor Core Availability on Edge GPUs",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2186",
"title": "Register Pressure on Jetson Orin Ampere SMs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1473",
"title": "Optimizing Sensor Fusion Kernels on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1655",
"title": "Heterogeneous Pipeline Design for Jetson Orin",
"bloom": "create"
}
],
"rationale": "Begins with identifying Jetson Orin compute ceilings, analyzing when Tensor Cores activate, diagnosing register pressure affecting SM occupancy, structuring CUDA thread blocks to avoid LPDDR5 bottlenecks, and finally architecting a massive heterogeneous pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-23",
"track": "edge",
"topic": "gpu-compute-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1425",
"title": "Batch Size Impact on Compute Utilization",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1697",
"title": "Optimizing INT8 Compute Utilization on Qualcomm AI 100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1838",
"title": "Optimizing Transformer Inference on Qualcomm Cloud AI 100",
"bloom": "analyze"
}
],
"rationale": "Progresses from discovering why batch size impacts sustained TOPS, optimizing memory bottlenecks restricting compute utilization, to performing full tuning and coalescing for transformer GEMMs on the accelerator.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-007-24",
"track": "edge",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1793",
"title": "Google Coral Edge TPU Supported Data Type",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2060",
"title": "Edge TPU Fleet Reliability: MTBF Calculation for 99.9% Uptime",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1289",
"title": "Diagnosing Latency Spikes on Edge TPUs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1754",
"title": "Sizing Telemetry for Coral Edge TPU Fleet",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2062",
"title": "Real-time Edge TPU Monitoring Strategy",
"bloom": "analyze"
}
],
"rationale": "Starts with understanding Edge TPU data constraints for reliability, calculates MTBF targets, diagnoses intermittent thermal latency spikes, sizes the telemetry ingestion over slow cellular links, and designs the end-to-end monitoring strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-01",
"track": "edge",
"topic": "safety-certification",
"competency_area": "reliability",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-2045",
"title": "Ensuring ISO 26262 Compliance on Google Coral Edge TPU for Safety-Critical ML",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1674",
"title": "Architecting ISO-26262 Compliant Vision on Coral Edge TPU",
"bloom": "create"
}
],
"rationale": "Explores functional safety architectures for the Coral Edge TPU, progressing from defining watchdogs to architecting full ASIL-B compliance within real-time limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-02",
"track": "edge",
"topic": "safety-certification",
"competency_area": "reliability",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1632",
"title": "Watchdog Timeout Calculation Under Contention",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1399",
"title": "Automotive ASIL-D Certification for Lidar Perception on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0706",
"title": "The Functional Safety Redundancy Cost",
"bloom": "create"
}
],
"rationale": "Progresses from calculating watchdog limits under contention to specifying and redesigning redundant deterministic architectures for autonomous vehicles on Jetson Orin.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-11",
"track": "edge",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1365",
"title": "Jetson Orin GPU vs DLA Efficiency",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0697",
"title": "The DLA vs GPU Scheduling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0702",
"title": "The Multi-Model Scheduling Problem",
"bloom": "evaluate"
}
],
"rationale": "Examines the tradeoffs between GPU and dedicated DLA on Jetson Orin, scaling from efficiency calculations to complex multi-model scheduling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-13",
"track": "edge",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "edge-1518",
"title": "Diagnosing Host Bottlenecks with Dataflow Accelerators",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1840",
"title": "Edge AI Deployment: Selecting the Optimal Accelerator for Real-time Object Detection with Hailo-8",
"bloom": "analyze"
}
],
"rationale": "Diagnoses host bottlenecks on dataflow accelerators like Hailo-8 and evaluates their system-level tradeoffs against traditional SoC designs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-14",
"track": "edge",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1839",
"title": "Evaluating Qualcomm Cloud AI 100 for Edge ML Inference",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1586",
"title": "Estimating Maximum Throughput on Qualcomm Cloud AI 100",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1458",
"title": "On-Premise Video Analytics Accelerator Selection",
"bloom": "evaluate"
}
],
"rationale": "Evaluates the Qualcomm Cloud AI 100 for edge inference, from high-level comparison to max throughput estimation and enterprise server integration.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-15",
"track": "edge",
"topic": "dma-data-movement",
"competency_area": "memory",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1405",
"title": "Jetson Orin Zero-Copy Memory Architecture Identification",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1870",
"title": "Optimizing Data Movement on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1737",
"title": "Zero-Copy Pipeline Design on Unified Edge Architectures",
"bloom": "evaluate"
}
],
"rationale": "Investigates zero-copy memory patterns on Jetson Orin architectures, moving from foundational identification to bottleneck diagnosis and pipeline redesign.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-17",
"track": "edge",
"topic": "dma-data-movement",
"competency_area": "memory",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1567",
"title": "Edge TPU USB Bandwidth Bottleneck",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1467",
"title": "Zero-Copy Video Pipeline Architecture for Google Coral Edge TPU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1397",
"title": "Zero-Copy Video Pipeline for Coral Edge TPU",
"bloom": "create"
}
],
"rationale": "Addresses USB I/O bottlenecks on the Coral Edge TPU, progressing from fundamental bandwidth calculations to eliminating CPU transfers via zero-copy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-18",
"track": "edge",
"topic": "dma-data-movement",
"competency_area": "memory",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1373",
"title": "DMA Transfer Latency Calculation for 4K Video Batches",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1318",
"title": "Optimizing PCIe DMA for 4K Video on Cloud AI 100",
"bloom": "evaluate"
}
],
"rationale": "Evaluates massive data movement constraints for 4K video on high-end edge accelerators like Qualcomm Cloud AI 100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-19",
"track": "edge",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1349",
"title": "Distilling to INT8 for Google Coral Edge TPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1476",
"title": "Architecting Knowledge Distillation for Coral Edge TPU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1657",
"title": "Cross-Architecture Distillation for Google Coral Edge TPU",
"bloom": "create"
}
],
"rationale": "Focuses on the strict constraints of distilling models onto the Coral Edge TPU, transitioning from capability estimation to full cross-architecture student formulation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-20",
"track": "edge",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1963",
"title": "Optimizing Large Language Models on Qualcomm AI 100: Knowledge Distillation Strategies",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1381",
"title": "Sizing the Distilled INT8 Student Model",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1537",
"title": "Distillation vs Pruning on Cloud AI 100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1968",
"title": "Knowledge Distillation for Efficient LLM Deployment on Qualcomm Cloud AI 100",
"bloom": "analyze"
}
],
"rationale": "Explores the deployment of distilled LLMs onto the Qualcomm Cloud AI 100, comparing it to pruning and sizing INT8 memory limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-21",
"track": "edge",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1429",
"title": "Distillation Arithmetic Intensity on Hailo-8",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1700",
"title": "Feature Distillation I/O Bottleneck",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1969",
"title": "Optimizing Vision Models for Hailo-8 with Knowledge Distillation",
"bloom": "analyze"
}
],
"rationale": "Tackles distillation on dataflow accelerators (Hailo-8), resolving specific feature map bottlenecks caused by intermediate activation streaming.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-22",
"track": "edge",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1775",
"title": "Jetson Orin DLA Offloading for Distilled Models",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1307",
"title": "Distilled INT8 Quantization Collapse",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1745",
"title": "Distilling a Hybrid Object Detector for Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Investigates distilling models for heterogeneous Edge SoC deployment (Jetson Orin DLA + GPU), fixing quantization collapse and optimizing layout.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-24",
"track": "edge",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1864",
"title": "Hailo-8 Memory-Mapped Weight Loading Strategies for Shared Inference",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1287",
"title": "Hailo-8 Model Thrashing with mmap Weight Loading",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1747",
"title": "Hailo-8 Multi-Process Memory-Mapped Inference",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1867",
"title": "Optimizing Memory-Mapped Inference on Hailo-8 for Edge AI",
"bloom": "analyze"
}
],
"rationale": "Explores the challenges of using mmap to serve models across multi-process pipelines on the Hailo-8 accelerator.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-25",
"track": "edge",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1621",
"title": "Calculate Mmap Cold Start Latency on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2533",
"title": "Demand Paging for Edge Model Deployment",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-1326",
"title": "Evaluating Memory-Mapped Weight Loading on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Addresses loading heavy Vision/LLM weights via demand paging and memory-mapping on the unified memory Jetson Orin platform.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-26",
"track": "edge",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1351",
"title": "Multi-Process mmap on Qualcomm Cloud AI 100",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1480",
"title": "Architecting Multi-Model Memory-Mapped Inference on Cloud AI 100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1869",
"title": "Optimizing Memory-Mapped Large Model Inference on Qualcomm Cloud AI 100",
"bloom": "analyze"
}
],
"rationale": "Designs concurrent sharing for multiple worker processes loading massive LLM models on the Qualcomm Cloud AI 100 via mmap.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-29",
"track": "edge",
"topic": "operator-scheduling",
"competency_area": "optimization",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1578",
"title": "Parallel Operator Scheduling and Energy on AI 100",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1484",
"title": "Multi-Tenant Operator Scheduling on Qualcomm Cloud AI 100",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1668",
"title": "Optimal LLM Operator Scheduling on Qualcomm Cloud AI 100",
"bloom": "create"
}
],
"rationale": "Progresses from parallel scheduling fundamentals to multi-tenant transformer orchestration and optimal LLM execution on Cloud AI 100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-31",
"track": "edge",
"topic": "storage-format-selection",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1357",
"title": "Edge Telemetry Storage Sizing on Jetson Orin",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1490",
"title": "Edge Sensor Data Ingestion Pipeline Architecture on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1675",
"title": "Multi-modal Sensor Data Ingestion Architecture on Jetson Orin",
"bloom": "create"
}
],
"rationale": "Designs the end-to-end data telemetry storage on Jetson Orin, shifting from footprint estimation to highly concurrent hybrid storage tiering.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-32",
"track": "edge",
"topic": "storage-format-selection",
"competency_area": "data",
"levels": [
"L1",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1786",
"title": "Edge Inference Logging Storage Format",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2190",
"title": "FlatBuffers vs Protobuf for Edge Model Serving",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1758",
"title": "Edge TPU Storage Format and I/O Throughput Sizing",
"bloom": "evaluate"
}
],
"rationale": "Addresses the specific serialization tools (FlatBuffers vs Protobuf) suited for Edge TPU deployments and their I/O performance.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-008-34",
"track": "edge",
"topic": "storage-format-selection",
"competency_area": "data",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-1713",
"title": "Storage I/O Optimization for Qualcomm AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2091",
"title": "Optimizing Data Storage for On-Device Inference on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2095",
"title": "Edge AI Data Storage Optimization on Qualcomm Cloud AI 100",
"bloom": "analyze"
}
],
"rationale": "Focuses on restructuring JSON/PNG storage pipelines into highly concurrent and compressed logs to saturate the Qualcomm Cloud AI 100.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-19",
"track": "edge",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-2288",
"title": "Jetson Orin Federated Ring",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-2511",
"title": "1GbE Orin AllGather Sync",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2342",
"title": "Orin Ring AllReduce",
"bloom": "evaluate"
}
],
"rationale": "Introduces the basic steps of a Ring AllReduce, calculates its latency across an Ethernet switch, and evaluates architectural upgrades to a hierarchical topology under cross-rack bandwidth contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-20",
"track": "edge",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "edge-0534",
"title": "The CAN Bus Bandwidth Crunch",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2513",
"title": "CAN Bus Protocol Framing Penalty",
"bloom": "evaluate"
}
],
"rationale": "Examines the severe bandwidth constraints of edge automotive networks, moving from analyzing ML telemetry throughput on a CAN bus to calculating the explicit framing penalty of performing parameter averaging over that same bus.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-21",
"track": "edge",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-0427",
"title": "The Sensor Fusion Sync Failure",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-2298",
"title": "PCIe Collective Communication Efficiency",
"bloom": "evaluate"
}
],
"rationale": "Explores high-bandwidth intra-node communication, from calculating raw PCIe Gen4 transfer latencies to evaluating the efficiency of CPU-driven collective patterns versus device-to-device Ring AllReduce.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-22",
"track": "edge",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "edge-1772",
"title": "Hailo-8 Host-Side Data Stream Validation",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2073",
"title": "Edge Data Validation on Hailo-8 for Real-time ML",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1299",
"title": "Host Data Validation Starving Hailo-8 Dataflow",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1734",
"title": "Streaming Data Validation for Hailo-8 Edge Inference",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2077",
"title": "Edge Data Quality and Validation for Critical Hailo-8 Deployments",
"bloom": "analyze"
}
],
"rationale": "Follows the lifecycle of implementing data quality checks on the DRAM-less Hailo-8, starting from fundamental stream concepts, diagnosing host CPU starvation, evaluating architectural placement, and designing a comprehensive mission-critical validation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-23",
"track": "edge",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1416",
"title": "INT8 Clipping in Edge Quality Gates",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1801",
"title": "Edge TPU Visual Inspection Data Quality Pipeline",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2074",
"title": "Edge Data Integrity: Coral TPU Architecture Evaluation",
"bloom": "analyze"
}
],
"rationale": "Explores the challenges of INT8 quantization for data quality on the Coral Edge TPU, from identifying clipping anomalies to designing an edge-only pipeline and evaluating the tradeoffs of offloading checks to the host.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-24",
"track": "edge",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1607",
"title": "Edge Data Quality Gate Compute Utilization",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2076",
"title": "Edge Data Quality & Validation for Anomaly Detection on NVIDIA Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1528",
"title": "Edge Data Quality Gating on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Focuses on resource allocation for validation on the Jetson Orin, calculating TOPS utilization, diagnosing constraints, and making strategic choices between the Ampere GPU and DLA for quality gates.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-27",
"track": "edge",
"topic": "distribution-drift-detection",
"competency_area": "reliability",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1344",
"title": "Drift Detection Latency Budget on Google Coral",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1261",
"title": "Edge Drift Detection Architecture on Coral TPU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1648",
"title": "On-Device Distribution Drift Detection for Edge TPU",
"bloom": "create"
}
],
"rationale": "Progresses from calculating latency budgets for on-device PSI to architecting a drift detection system under severe bandwidth limits, culminating in a fleet-wide distribution drift strategy on INT8 hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-28",
"track": "edge",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1406",
"title": "Edge TPU Precision Requirements for Bias Evaluation",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1514",
"title": "INT8 Quantization Bias on Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1741",
"title": "Quantized Edge TPU Fairness Evaluation Strategy",
"bloom": "evaluate"
}
],
"rationale": "Starts by defining the strict INT8 precision limits of the Coral TPU, identifies hardware-induced bias post-quantization, and tasks the learner with formulating a deployment strategy that natively evaluates fairness under these 2W INT8 constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-009-30",
"track": "edge",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1223",
"title": "INT8 Quantization Bias on Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1696",
"title": "Optimizing Intersectional Fairness Evaluation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2111",
"title": "Fairness-Aware Model Deployment on Edge AI Accelerator",
"bloom": "analyze"
}
],
"rationale": "Traces fairness deployment on a high-throughput edge server, from observing quantization bias in specific subgroups to optimizing the compute bottleneck for intersectional evaluation, and designing the full deployment architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-010-01",
"track": "edge",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1244",
"title": "Hailo-8 Host-to-Device Streaming Latency Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1711",
"title": "Hailo-8 Host Streaming Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1896",
"title": "Optimizing Latency on Hailo-8: Profiling for Edge ML Bottlenecks",
"bloom": "analyze"
}
],
"rationale": "Progresses from diagnosing a specific Hailo-8 compute/IO bottleneck to fixing host-side streaming limits and profiling end-to-end pipeline latency violations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-010-02",
"track": "edge",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1355",
"title": "USB I/O Bottleneck on Coral Edge TPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1277",
"title": "Coral Edge TPU Pipeline Bottleneck Analysis Design",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1670",
"title": "Edge TPU Pipeline Profiling and Bottleneck Resolution",
"bloom": "create"
}
],
"rationale": "Explores Coral Edge TPU latency from calculating basic USB transfer bottlenecks to designing profiling pipelines and completely resolving CPU fallback issues.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-010-03",
"track": "edge",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "edge-1894",
"title": "Diagnosing Latency Spikes on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1630",
"title": "Compute-Bound Latency on Cloud AI 100",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1898",
"title": "Diagnosing Latency on Qualcomm Cloud AI 100 with Profiling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1544",
"title": "Bottleneck Analysis on Qualcomm Cloud AI 100",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner through diagnosing latency spikes on Cloud AI 100, calculating compute bounds, profiling memory vs I/O bounds, and finally evaluating architectural solutions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-010-04",
"track": "edge",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1448",
"title": "Quantization Bias in TPU Guardrails",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1712",
"title": "Optimizing Privacy Guardrails on Edge TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2117",
"title": "Responsible AI Evaluation on Edge TPU: Quantized Model Comparison",
"bloom": "analyze"
}
],
"rationale": "Explores the impact of INT8 quantization on model fairness on Coral Edge TPUs, moving from identifying bias to optimizing guardrails and evaluating competing quantized architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-010-05",
"track": "edge",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1356",
"title": "Guardrail Memory Footprint on Cloud AI 100",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1278",
"title": "Architecting On-Premise Guardrails for PII Redaction",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1672",
"title": "On-Premise Guardrail Architecture for High-Throughput Edge Inference",
"bloom": "create"
}
],
"rationale": "Examines the memory and hardware footprint of on-premise guardrails on Cloud AI 100, progressing from footprint calculation to pipeline architecture and high-throughput deployment.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-01",
"track": "edge",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1769",
"title": "Shadow Deployment Quantization Requirement on Coral TPU",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-2020",
"title": "Edge TPU Model Rollout Strategy with A/B Testing",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1293",
"title": "Canary Rollout Latency Spike on Edge TPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1724",
"title": "Canary Rollout of INT8 Model on Coral Edge TPU",
"bloom": "evaluate"
}
],
"rationale": "Progresses from foundational quantization requirements on Coral TPU to planning a canary rollout, diagnosing a latency spike during the rollout, and making a strategic rollback decision.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-02",
"track": "edge",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1336",
"title": "Shadow Deployment Memory Constraints on Jetson Orin",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1254",
"title": "Shadow Deployment on Jetson Orin",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1639",
"title": "Shadow Deployment Arbitration on Jetson Orin",
"bloom": "create"
}
],
"rationale": "Explores shadow deployment resource constraints on Jetson Orin, advancing from basic memory calculation to handling thermal/OOM limits, and finally designing the full concurrent arbitration architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-03",
"track": "edge",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L2",
"L4"
],
"questions": [
{
"level": "L2",
"id": "edge-2018",
"title": "Progressive Rollout Strategy for Edge ML on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1683",
"title": "Canary Rollout Context Thrashing on AI 100",
"bloom": "analyze"
}
],
"rationale": "Moves from general progressive rollout strategy on Cloud AI 100 to diagnosing a specific context thrashing bottleneck during the canary phase.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-04",
"track": "edge",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2037",
"title": "Graceful Degradation on Jetson Orin for Autonomous Vehicles",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1226",
"title": "Thermal Throttling and Task Shedding on Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1698",
"title": "Thermal Throttling Degradation on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2043",
"title": "Designing a Graceful Degradation Strategy for Real-time ML on NVIDIA Jetson Orin",
"bloom": "analyze"
}
],
"rationale": "Guides the learner through managing thermal and load constraints on Jetson Orin, scaling from task shedding concepts to full fail-operational architecture design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-05",
"track": "edge",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1615",
"title": "Edge TPU Fallback Model Compute Budget",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1536",
"title": "Evaluating Fail-Operational Fallbacks on Edge TPU",
"bloom": "evaluate"
}
],
"rationale": "Moves from calculating the compute budget of a fallback model on Coral TPU to evaluating the effectiveness of different fail-operational fallback strategies under thermal constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-06",
"track": "edge",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1572",
"title": "Power-Constrained QoS Shedding on Hailo-8",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2038",
"title": "Graceful Degradation for Real-time Object Detection on Edge AI (Hailo-8)",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1267",
"title": "Thermal and Bandwidth Graceful Degradation on Hailo-8",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating QoS shedding limits on Hailo-8 to structuring a degradation ladder, and finally handling complex host-bandwidth constraints during thermal throttling.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-08",
"track": "edge",
"topic": "memory-pressure-management",
"competency_area": "memory",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1352",
"title": "Host Buffer Sizing for Dataflow Edge Streaming",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1481",
"title": "Handling Host Memory Pressure for Hailo-8 Streams",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1661",
"title": "Zero-Copy Host Streaming Architecture for Hailo-8 Memory",
"bloom": "create"
}
],
"rationale": "Explores the unique memory pressure of streaming data to a DRAM-less Hailo-8 accelerator, advancing from buffer sizing to managing fragmentation, and ultimately redesigning for zero-copy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-09",
"track": "edge",
"topic": "memory-pressure-management",
"competency_area": "memory",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1791",
"title": "Qualcomm Cloud AI 100 Physical Memory Capacity Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "edge-1879",
"title": "LLM Deployment with Memory Constraints on Qualcomm Cloud AI 100",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1308",
"title": "Diagnosing OOM from Memory Fragmentation on AI 100",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1748",
"title": "Sizing Paged Memory for Multi-Tenant LLM Inference",
"bloom": "evaluate"
}
],
"rationale": "Builds from basic memory specs of the Cloud AI 100 to understanding LLM memory constraints, diagnosing fragmentation OOMs, and deploying advanced paged memory strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-25",
"track": "edge",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-1948",
"title": "Encoder-Decoder Selection for NLU on Google Coral Edge TPU",
"bloom": "analyze"
},
{
"level": "L3",
"id": "edge-1610",
"title": "Edge TPU Encoder-Decoder Throughput Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1950",
"title": "Edge TPU Deployment: Encoder-Decoder Model Latency Diagnosis",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-1954",
"title": "Encoder-Decoder Tradeoffs on Google Coral Edge TPU for Language Tasks",
"bloom": "analyze"
}
],
"rationale": "Guides the learner from basic architecture selection on the Edge TPU, to calculating exact sequence throughput, diagnosing real-world thermal and latency bottlenecks, and making holistic architecture tradeoffs for NLU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-26",
"track": "edge",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1568",
"title": "Encoder vs Decoder on DRAM-less Hailo-8",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1263",
"title": "Architecting Sequence Models for Hailo-8 Dataflow",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1650",
"title": "Dataflow Optimization for Sequenced Output",
"bloom": "create"
}
],
"rationale": "Progresses from calculating theoretical minimum sequence latencies on a DRAM-less dataflow accelerator to choosing the optimal architecture, and finally redesigning the host-streaming strategy for throughput.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-27",
"track": "edge",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1420",
"title": "Encoder-Decoder Latency Disparity on Jetson Orin",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-1695",
"title": "Optimizing Encoder-Decoder Latency on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1949",
"title": "Edge Deployment Tradeoffs: Encoder-Decoder Architectures on Jetson Orin",
"bloom": "analyze"
}
],
"rationale": "Focuses on the architectural disparity between parallel encoding and autoregressive decoding on Jetson Orin, diagnosing the memory bottleneck during decoding, and making final NLP deployment choices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-28",
"track": "edge",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-1353",
"title": "Estimating ResNet-50 Inference Speed on Coral Edge TPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1273",
"title": "Edge Video Analytics Pipeline Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-1662",
"title": "Edge TPU QAT and Mixed-Precision Pipeline Design",
"bloom": "create"
}
],
"rationale": "Evolves from calculating raw INT8 throughput on the Coral Edge TPU to architecting an FP32-to-INT8 conversion pipeline, culminating in designing a custom BF16/FP32 QAT training recipe.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-29",
"track": "edge",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2120",
"title": "INT8 Quantization Impact on Regression Heads",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1705",
"title": "Dataflow Pipeline Bottlenecks in Mixed-Precision",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1904",
"title": "Designing a Mixed-Precision Strategy for Edge Deployment on Hailo-8",
"bloom": "analyze"
}
],
"rationale": "Progresses from analyzing the accuracy impact of INT8 quantization on bounding boxes to diagnosing host FP16 fallback bottlenecks, and designing a holistic mixed-precision strategy for the dataflow architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-011-30",
"track": "edge",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-1778",
"title": "Jetson Orin Peak INT8 Performance Recall",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-1309",
"title": "Diagnosing FP16 Overflow on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1749",
"title": "Mixed-Precision Perception on Jetson Orin",
"bloom": "evaluate"
}
],
"rationale": "Starts with basic Jetson Orin INT8 compute specs, moves to diagnosing TensorRT FP16 overflow issues causing collapsing bounding boxes, and concludes by deploying a multimodal model under strict power limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-014-09",
"track": "edge",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-2300",
"title": "PIR-Triggered Accelerator Power States",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-2286",
"title": "Hailo-8 Solar Camera",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2276",
"title": "Hailo-8 Accelerator Duty Cycling",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2480",
"title": "Wildlife Camera Energy Waste",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2392",
"title": "Hailo-8 PIR Duty Cycling",
"bloom": "evaluate"
}
],
"rationale": "Builds a complete understanding of Hailo-8 duty cycling, from basic power states and duty calculations to diagnosing idle waste and evaluating PIR-triggered architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-014-10",
"track": "edge",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2327",
"title": "Power vs Energy in Edge Duty Cycling",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-2358",
"title": "Jetson AGX Orin Duty-Cycle Energy Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2365",
"title": "Orin Power Mode Threshold",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "edge-2469",
"title": "Constructing Edge Duty-Cycling Power Budgets",
"bloom": "create"
}
],
"rationale": "Focuses on Jetson AGX Orin power profiles, moving from basic energy vs power concepts to calculating thresholds and constructing complex duty-cycling budgets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-014-11",
"track": "edge",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1249",
"title": "Unified Memory Bandwidth Starvation",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-1714",
"title": "Optimizing High-Res Camera Ingestion on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2085",
"title": "Real-Time Anomaly Detection on Edge AI Device",
"bloom": "analyze"
}
],
"rationale": "Explores Jetson Orin unified memory bottlenecks, from identifying explicit copy starvation to optimizing zero-copy NVMM pipelines for high-frequency sensor ingestion.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-014-12",
"track": "edge",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1358",
"title": "Hailo-8 Multi-Camera Ingestion Bandwidth",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1280",
"title": "Zero-Copy Video Ingestion Pipeline for Hailo-8",
"bloom": "evaluate"
}
],
"rationale": "Focuses on DRAM-less architectures like Hailo-8, starting with bandwidth requirements and advancing to designing zero-copy host ingestion pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-014-13",
"track": "edge",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-1633",
"title": "Calculate Edge TPU Ingestion Throughput for Video Stream",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-2088",
"title": "Optimizing Real-time Edge Inference on Coral TPU for Defect Detection",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-1547",
"title": "Edge TPU Audio Event Ingestion Architecture",
"bloom": "evaluate"
}
],
"rationale": "Addresses Coral Edge TPU pipelines, progressing from USB throughput limits to optimizing and architecting real-time audio/video ingestion.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-016-04",
"track": "edge",
"topic": "communication-computation-overlap",
"competency_area": "parallelism",
"levels": [
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2485",
"title": "CUDA Stream Scheduling",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-2415",
"title": "Evaluating Unified Memory Asynchronous Transfers on SoC",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2445",
"title": "Orin DMA and Inference Pipelining",
"bloom": "create"
}
],
"rationale": "Explores pipeline scheduling on unified edge memory, starting with basic CUDA stream scheduling, evaluating unified memory async transfer limits, and designing a complex DMA/inference overlapping schedule.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-016-09",
"track": "edge",
"topic": "3d-parallelism",
"competency_area": "parallelism",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "edge-0456",
"title": "The Autonomous Perception Latency Puzzle",
"bloom": "apply"
},
{
"level": "L4",
"id": "edge-0553",
"title": "The Automotive Parallelism Dilemma",
"bloom": "analyze"
}
],
"rationale": "Explores parallelism strategies for edge devices, transitioning from a conceptual choice between TP and PP to applying the dilemma directly to a 20B ViT across two Orins.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-016-10",
"track": "edge",
"topic": "model-adaptation-systems",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2240",
"title": "Adapter Rollback and Version Management at Edge",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2256",
"title": "LoRA Adapter Update Over-the-Air for Edge Devices",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-2258",
"title": "Federated LoRA Aggregation for Edge Fleet Personalization",
"bloom": "apply"
}
],
"rationale": "Focuses on the lifecycle of edge LoRA adapters, moving from rollback diagnosis, to OTA update design, to constructing a federated aggregation pipeline for fleet-wide personalization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-016-11",
"track": "edge",
"topic": "software-portability",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2166",
"title": "ONNX Runtime Mobile vs TensorRT on Jetson Orin for Edge Inference",
"bloom": "understand"
},
{
"level": "L4",
"id": "edge-2170",
"title": "Handling INT8 Quantization Discrepancies Across Edge Runtimes",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2168",
"title": "Portable Model Optimization Pipeline for Multi-Edge Deployment",
"bloom": "create"
}
],
"rationale": "Teaches cross-platform edge deployment: identifying ONNX vs TensorRT overheads, debugging quantization accuracy mismatches, and designing a unified pipeline for heterogeneous targets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-016-12",
"track": "edge",
"topic": "tail-latency",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2182",
"title": "Latency Jitter from DVFS on Edge Devices",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2181",
"title": "Tail Latency on Edge with Thermal Throttling",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2250",
"title": "Stochastic Input Variance and P99 Latency on Jetson Orin Vision Pipelines",
"bloom": "evaluate"
}
],
"rationale": "Builds intuition on edge latency variability, moving from DVFS jitter to thermal throttling tails, and finally analyzing stochastic variance from complex inputs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-01",
"track": "edge",
"topic": "chiplet-architecture",
"competency_area": "compute",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "edge-2254",
"title": "POP vs 2.5D Interposer for Edge AI Module Cost",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "edge-2234",
"title": "Interposer vs Package-on-Package for Edge Chiplets",
"bloom": "evaluate"
}
],
"rationale": "Explores packaging technology choices (PoP vs Interposer) from general volume production tradeoffs to extreme low-power wearable constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-02",
"track": "edge",
"topic": "chiplet-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "edge-2252",
"title": "Die-to-Die Bandwidth Adequacy for Edge Vision Pipeline",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2233",
"title": "Die-to-Die Power Gating at Edge",
"bloom": "apply"
}
],
"rationale": "Analyzes an ISP/ML edge chiplet pipeline, progressing from evaluating die-to-die bandwidth bottlenecks for 4K60 video to designing strict power-gating strategies for the same architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-03",
"track": "edge",
"topic": "chiplet-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2232",
"title": "NUMA-Aware Memory Allocation on Jetson Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2255",
"title": "NUMA-Aware Runtime Scheduling on Embedded Chiplet",
"bloom": "apply"
}
],
"rationale": "Investigates NUMA effects on heterogeneous embedded systems, moving from identifying memory bandwidth bottlenecks on Orin to resolving OS thread migration penalties across chiplet dies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-04",
"track": "edge",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "edge-2360",
"title": "Orin NVMe Interconnect Standard",
"bloom": "remember"
},
{
"level": "L4",
"id": "edge-2299",
"title": "GMSL Camera PCIe Bottleneck on Orin",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2528",
"title": "CSI-2 Direct vs PCIe Switched Latency",
"bloom": "evaluate"
}
],
"rationale": "Addresses high-speed I/O on the Jetson Orin, progressing from identifying standard NVMe protocols to analyzing PCIe bottlenecks for multiple cameras, and evaluating switched FPGA latency penalties.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-05",
"track": "edge",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-2285",
"title": "Orin Mesh Bisection Bandwidth",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2529",
"title": "Jetson Cluster Ring vs Star Broadcast",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2531",
"title": "PCIe Fabric Ring Sched",
"bloom": "create"
}
],
"rationale": "Progresses from evaluating static mesh bisection bandwidth, to comparing Ring vs Star broadcast topologies, to finally designing a custom collective routing schedule over a switched fabric.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-21",
"track": "edge",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "edge-2367",
"title": "Hailo-8 Dual Model Pipeline",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-2282",
"title": "Edge 4-Stage Synchronous Pipeline",
"bloom": "understand"
},
{
"level": "L3",
"id": "edge-0976",
"title": "Asymmetric Split-Computing Pipeline",
"bloom": "apply"
}
],
"rationale": "Covers foundational pipeline parallelism concepts on edge hardware, from defining overlapping stages to calculating latency and throughput in progressively more complex, asymmetric hardware splits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-22",
"track": "edge",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "edge-2504",
"title": "10GbE Pipeline Microbatch Sizing",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2508",
"title": "BLE Connection Sync Buffering",
"bloom": "create"
}
],
"rationale": "Addresses network-induced synchronization challenges in distributed edge pipelines, moving from calculating microbatch sizes over Ethernet to designing schedules that mask strict BLE connection intervals.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-40",
"track": "edge",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-2194",
"title": "Privacy Budget for Continuous Edge Learning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-2193",
"title": "Differential Privacy for Federated Edge Learning",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-0787",
"title": "The Privacy-Preserving Drift Correction",
"bloom": "create"
}
],
"rationale": "Focuses on differential privacy in distributed edge learning, advancing from computing long-term privacy budgets to tuning aggregation noise, and ultimately designing a compliant system to correct model drift without exposing PII.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-41",
"track": "edge",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "edge-2242",
"title": "DP Noise Calibration for Sensor Fusion on Edge",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2241",
"title": "On-Device DP Inference for Medical Wearables",
"bloom": "apply"
},
{
"level": "L6+",
"id": "edge-0785",
"title": "The Privacy Guardian",
"bloom": "create"
}
],
"rationale": "Investigates Local Differential Privacy (LDP) constraints, progressing from diagnosing accuracy collapses caused by naive feature-level noise to evaluating the architectural tradeoffs of adding LDP to outputs versus keeping all compute strictly on-device.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-42",
"track": "edge",
"topic": "speculative-decoding",
"competency_area": "latency",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "edge-1207",
"title": "N-Gram Draft for Zero-Overhead Edge Speculation",
"bloom": "apply"
},
{
"level": "L5",
"id": "edge-1206",
"title": "Speculative Decoding on Jetson Orin",
"bloom": "create"
},
{
"level": "L6+",
"id": "edge-0616",
"title": "The In-Car Assistant Latency Crisis",
"bloom": "create"
}
],
"rationale": "Explores draft generation on constrained edge hardware, moving from implementing n-gram lookup tables to evaluating bandwidth-constrained draft strategies, and finally architecting a real-time conversational agent for a vehicle.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-43",
"track": "edge",
"topic": "tensor-arena-planning",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L4"
],
"questions": [
{
"level": "L1",
"id": "edge-0141",
"title": "Defining the Tensor Arena",
"bloom": "remember"
},
{
"level": "L2",
"id": "edge-0142",
"title": "The SRAM Tensor Arena Squeeze",
"bloom": "understand"
},
{
"level": "L4",
"id": "edge-2195",
"title": "Tensor Arena Planning for Jetson Orin Multi-Model Serving",
"bloom": "apply"
}
],
"rationale": "Guides learners from the fundamental definition of a tensor arena in microcontrollers to calculating peak memory with reuse, and scales up to managing memory fragmentation for multi-model serving on powerful edge devices.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-53",
"track": "edge",
"topic": "sustainability-carbon-accounting",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "edge-2247",
"title": "Idle Power Governance for Always-On Edge Inference",
"bloom": "analyze"
},
{
"level": "L4",
"id": "edge-2246",
"title": "Embodied Carbon Dominance in TinyML Edge Deployment",
"bloom": "analyze"
},
{
"level": "L5",
"id": "edge-0931",
"title": "Smart Camera Carbon Payback Period",
"bloom": "analyze"
}
],
"rationale": "Focuses on the unique carbon economics of edge AI, progressing from managing idle power waste to calculating the break-even point where the embodied carbon of edge devices is justified by cloud transmission savings.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "edge-chain-auto-secondary-017-56",
"track": "edge",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "edge-2364",
"title": "Hailo-8 Camera Fan-in Link",
"bloom": "understand"
},
{
"level": "L5",
"id": "edge-2349",
"title": "IP Camera Gigabit Bottleneck",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "edge-2345",
"title": "Camera PCIe Fanout Topology",
"bloom": "create"
}
],
"rationale": "Analyzes the complete network path for edge video ingestion, progressing from simple bandwidth aggregation to diagnosing IP camera packet drops, and architecting PCIe topologies that eliminate bus contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-006-29",
"track": "global",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "global-0027",
"title": "Batch Size and Compute Intensity",
"bloom": "remember"
},
{
"level": "L2",
"id": "global-0028",
"title": "LLM Generation Phase Bottleneck",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0207",
"title": "Self-Attention Arithmetic Intensity During Decode",
"bloom": "apply"
},
{
"level": "L5",
"id": "global-0138",
"title": "Memory Bandwidth Bound Decode",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "global-0110",
"title": "LLM Decode Batch Size Limits",
"bloom": "evaluate"
}
],
"rationale": "Traces arithmetic intensity from basic definitions through autoregressive LLM decoding, scaling up to memory bandwidth limitations and batched decode bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-006-30",
"track": "global",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L2",
"L3"
],
"questions": [
{
"level": "L2",
"id": "global-0026",
"title": "Optimizing Arithmetic Intensity",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0212",
"title": "Roofline Classification of Elementwise Operations",
"bloom": "apply"
}
],
"rationale": "Examines the low arithmetic intensity of element-wise operations and calculates their peak throughput on the roofline model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-010-06",
"track": "global",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "global-0251",
"title": "Arithmetic Intensity Definition",
"bloom": "remember"
},
{
"level": "L2",
"id": "global-0261",
"title": "Memory-Bound vs Compute-Bound Intuition",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0300",
"title": "Global New 0001",
"bloom": "apply"
},
{
"level": "L5",
"id": "global-0278",
"title": "Batch Size to Reach Compute-Bound Regime",
"bloom": "evaluate"
}
],
"rationale": "Teaches roofline model fundamentals, starting from arithmetic intensity definition and intuition to calculating operational intensity bounds and scaling batch sizes to reach compute-bound regimes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-010-07",
"track": "global",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "global-0243",
"title": "Optimizing Time-to-First-Token for Interactive Chat",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0272",
"title": "Prefill Chunking vs Monolithic Prefill",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the latency dynamics of prefill chunking, starting with TTFT optimization and evaluating monolithic versus chunked prefill tradeoffs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-011-31",
"track": "global",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L1",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "global-0179",
"title": "The Anycast Routing Pattern",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0189",
"title": "The Time-to-First-Token Across Oceans",
"bloom": "apply"
},
{
"level": "L5",
"id": "global-0195",
"title": "The Inference Placement Decision",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the basics of anycast network routing to calculating trans-oceanic time-to-first-token latencies, and finally making high-level architecture decisions on centralized vs regional GPU clusters.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-011-32",
"track": "global",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "global-0224",
"title": "Little's Law for GPU Inference Server Sizing",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0245",
"title": "Right-Sizing an Inference Fleet for Variable Traffic",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "global-0440",
"title": "Multi-Tenant LoRA Serving Architecture on MI300X",
"bloom": "create"
}
],
"rationale": "Evolves from applying Little's Law to calculate static GPU requirements, to designing a cost-optimal fleet for variable traffic, and ultimately architecting a multi-tenant serving system on modern MI300X hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-011-33",
"track": "global",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L4"
],
"questions": [
{
"level": "L2",
"id": "global-0014",
"title": "The KV-Cache Bottleneck",
"bloom": "understand"
},
{
"level": "L4",
"id": "global-0089",
"title": "Safe Deployment for Latency-Sensitive Generation Models",
"bloom": "evaluate"
}
],
"rationale": "Progresses from identifying the KV-cache as the cause of autoregressive slowdowns to managing a latency spike during a V2 canary rollout (speculative decoding) and making safe deployment decisions.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-01",
"track": "global",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "global-0013",
"title": "The Parameter Memory Footprint",
"bloom": "remember"
},
{
"level": "L2",
"id": "global-0260",
"title": "Training Memory Budget Breakdown",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0217",
"title": "Mixed-Precision Training Memory Budget",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0227",
"title": "OOM at Step 500 but Not Step 1",
"bloom": "analyze"
}
],
"rationale": "Progresses from basic FP16 weight sizing to mixed-precision budget breakdowns and finally diagnosing gradual memory climbs during training.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-02",
"track": "global",
"topic": "memory-hierarchy-design",
"competency_area": "memory",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "global-0062",
"title": "Mitigating KV Cache Fragmentation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0060",
"title": "Tiered KV-Cache Page Size Tradeoffs",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0118",
"title": "Long-Running Inference OOM Death",
"bloom": "analyze"
}
],
"rationale": "Progresses from mitigating static KV cache fragmentation to architecting tiered KV blocks and ultimately redesigning memory for dynamic batching under long-running OOM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-03",
"track": "global",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "global-0176",
"title": "The GPU Cluster Power Wall",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0203",
"title": "Energy Cost of a Training Run",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0239",
"title": "Optimal Power Cap for Training Cost Minimization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0197",
"title": "The Renewable Intermittency Trap",
"bloom": "evaluate"
}
],
"rationale": "Moves from defining cluster power limits and basic energy costs to optimizing training power caps and finally scheduling across renewable intermittency.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-04",
"track": "global",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "global-0220",
"title": "Voltage Scaling and Dynamic Power",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0163",
"title": "The Overclocking Energy Trap",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "global-0129",
"title": "The Power-Clock Anomaly",
"bloom": "analyze"
}
],
"rationale": "Explores the non-linear relationship between voltage/frequency scaling and power, moving from simple calculations to diagnosing overclocking and underclocking anomalies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-14",
"track": "global",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L1",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "global-0015",
"title": "Quantization Basics",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0302",
"title": "Global New 0003",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0106",
"title": "Systematic Activation Outliers",
"bloom": "analyze"
}
],
"rationale": "Introduces INT8 quantization concepts, progressing to memory savings math and diagnosing systemic activation outlier drops.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-15",
"track": "global",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "global-0064",
"title": "SRAM Layer Fusion for Edge CNNs",
"bloom": "create"
},
{
"level": "L6+",
"id": "global-0125",
"title": "The Operator Fusion Trap",
"bloom": "evaluate"
}
],
"rationale": "Explores operator fusion dynamics, from proposing SRAM layer fusions to debugging severe performance regressions in custom fused kernels.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-014-16",
"track": "global",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "global-0049",
"title": "The Critical Batch Size",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0315",
"title": "Global New 0016",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0061",
"title": "LLM Serving Arithmetic Intensity",
"bloom": "analyze"
}
],
"rationale": "Focuses on arithmetic intensity and batching, moving from critical batch size theory to calculating memory-bound times and scaling without OOMs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-016-07",
"track": "global",
"topic": "fault-tolerance-checkpointing",
"competency_area": "reliability",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "global-0216",
"title": "Optimal Checkpointing Interval",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0069",
"title": "Optimizing Checkpoint Frequency via Young-Daly",
"bloom": "analyze"
}
],
"rationale": "Focuses on optimizing checkpoint frequency, transitioning from basic interval calculations to applying the Young-Daly formula for maximizing training goodput.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-016-13",
"track": "global",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "global-0213",
"title": "4-Bit Quantization for Consumer GPU Deployment",
"bloom": "apply"
},
{
"level": "L6+",
"id": "global-0131",
"title": "The INT4 Quantization Plateau",
"bloom": "analyze"
}
],
"rationale": "Explores the limits of weight-only quantization, shifting from assessing memory footprints to diagnosing the latency plateau when moving from INT8 to INT4.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-016-14",
"track": "global",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "global-0218",
"title": "Cross-Rack AllReduce Latency Impact",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0079",
"title": "GPUDirect RDMA NUMA Crossing Bottleneck",
"bloom": "analyze"
}
],
"rationale": "Focuses on RDMA bottlenecks, scaling from analyzing cross-rack AllReduce latency to debugging complex GPUDirect NUMA crossing issues.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-06",
"track": "global",
"topic": "data-parallelism",
"competency_area": "parallelism",
"levels": [
"L1",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "global-0255",
"title": "Three Axes of Parallelism",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0318",
"title": "Global New 0019",
"bloom": "apply"
},
{
"level": "L5",
"id": "global-0266",
"title": "Tensor vs Pipeline Parallelism for 70B",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0282",
"title": "3D Parallelism Configuration for 175B",
"bloom": "create"
}
],
"rationale": "Takes the learner from defining parallelism types to calculating memory requirements, comparing specific 2D strategies, and ultimately designing a full 3D parallelism configuration for a massive model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-07",
"track": "global",
"topic": "data-parallelism",
"competency_area": "parallelism",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "global-0299",
"title": "Replication vs Erasure Coding for Checkpoint Storage",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0280",
"title": "Resilient 1024-GPU Training System",
"bloom": "create"
}
],
"rationale": "Focuses on training resilience, progressing from evaluating checkpoint storage overhead to architecting a complete fault-tolerant training system for frequent node failures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-13",
"track": "global",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L1",
"L3",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "global-0004",
"title": "The FP16 vs INT8 Precision Choice",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0202",
"title": "INT8 Quantization Serving Throughput Gain",
"bloom": "apply"
},
{
"level": "L6+",
"id": "global-0111",
"title": "The Quantization Roofline Paradox",
"bloom": "analyze"
}
],
"rationale": "Builds understanding of quantization performance, from the basic motivation to calculating expected throughput gains, and finally analyzing complex roofline bottlenecks where expected gains fail to materialize.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-23",
"track": "global",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L2",
"L3"
],
"questions": [
{
"level": "L2",
"id": "global-0010",
"title": "The PyTorch DataLoader Deadlock",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0043",
"title": "The Experiment Reproducibility Crisis",
"bloom": "apply"
}
],
"rationale": "Explores the basic pitfalls of ML data pipelines, progressing from identifying single-node dataloader bottlenecks to debugging subtle non-deterministic behaviors in data shuffling and reproducibility.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-25",
"track": "global",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L1",
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "global-0254",
"title": "NVLink vs PCIe Bandwidth Gap",
"bloom": "remember"
},
{
"level": "L2",
"id": "global-0264",
"title": "InfiniBand vs Ethernet for Training",
"bloom": "understand"
},
{
"level": "L5",
"id": "global-0335",
"title": "Topology-Aware Placement Rule",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0286",
"title": "Network Topology for 2048-GPU Training Cluster",
"bloom": "create"
}
],
"rationale": "Progresses from understanding fundamental hardware interconnect bandwidths to comparing network protocols, making topology-aware placement decisions, and architecting a massive-scale cluster topology.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-26",
"track": "global",
"topic": "interconnect-topology",
"competency_area": "parallelism",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "global-0295",
"title": "Sudden Training Throughput Drop",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0271",
"title": "Tensor Parallelism Within vs Across Nodes",
"bloom": "evaluate"
}
],
"rationale": "Evaluates distributed training communication bottlenecks, moving from diagnosing a sudden throughput collapse during an active run to quantifying the exact communication overhead of cross-node tensor parallelism.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-27",
"track": "global",
"topic": "sustainability-carbon-accounting",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "global-0174",
"title": "The Grid Emissions Gap",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0181",
"title": "The CO2 Per Training Run",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0249",
"title": "Maximizing Tokens-per-Watt for Sustainable Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0191",
"title": "The Latency-Carbon-Cost Triangle",
"bloom": "evaluate"
}
],
"rationale": "Teaches the core mechanics of carbon accounting, from recognizing regional emission differences to optimizing inference throughput to meet strict carbon budgets and deployment SLAs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-28",
"track": "global",
"topic": "sustainability-carbon-accounting",
"competency_area": "power",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "global-0167",
"title": "The Carbon-Aware Checkpoint Penalty",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "global-0194",
"title": "The Green AI Paradox",
"bloom": "evaluate"
}
],
"rationale": "Explores the hidden pitfalls of carbon-aware scheduling, moving from discovering how pausing jobs can increase total emissions to analyzing the carbon overhead of moving massive datasets to greener grids.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-30",
"track": "global",
"topic": "transformer-systems-cost",
"competency_area": "architecture",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "global-0100",
"title": "Long-Context KV Cache RoPE Failure",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "global-0117",
"title": "Massive Context KV Cache Paging",
"bloom": "create"
}
],
"rationale": "Examines long-context KV cache bottlenecks, moving from debugging accuracy loss due to KV cache quantization to architecting a multi-tier memory paging system for million-token contexts.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-44",
"track": "global",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "global-0109",
"title": "Mixed-Precision Fleet Integration",
"bloom": "create"
},
{
"level": "L6+",
"id": "global-0281",
"title": "Multi-Model Serving Gateway Design",
"bloom": "create"
}
],
"rationale": "Focuses on heterogeneous model deployment at scale, progressing from routing traffic based on hardware precision capabilities to designing a full serving gateway that manages cold-starts and utilization across multiple LLMs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-45",
"track": "global",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L3",
"L5"
],
"questions": [
{
"level": "L1",
"id": "global-0257",
"title": "GPU-Hour Cost Decomposition",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0206",
"title": "Buy vs Rent GPU Break-Even",
"bloom": "apply"
},
{
"level": "L5",
"id": "global-0275",
"title": "On-Prem vs Cloud GPU Cluster Economics",
"bloom": "evaluate"
}
],
"rationale": "Develops procurement modeling skills, starting from understanding the hidden costs embedded in cloud pricing to calculating simple break-even points, and finally executing a full 64-GPU cluster TCO analysis.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-46",
"track": "global",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "global-0232",
"title": "Spot Instance Strategy for Fault-Tolerant Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0267",
"title": "Spot vs On-Demand for Long Training Runs",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the probabilistic economics of spot computing, progressing from evaluating the per-step cost of preemption to determining the mathematical break-even point for a multi-week distributed training run.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-54",
"track": "global",
"topic": "compound-ai-systems",
"competency_area": "architecture",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "global-0137",
"title": "Prompt Caching PCIe Bottleneck",
"bloom": "create"
},
{
"level": "L6+",
"id": "global-0041",
"title": "The Agentic Memory Architecture",
"bloom": "create"
}
],
"rationale": "Explores the massive memory requirements of agentic LLM systems, moving from resolving PCIe bottlenecks in static prompt caching to architecting complex, multi-tiered retrieval systems for infinite-context coding agents.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-55",
"track": "global",
"topic": "kernel-fusion",
"competency_area": "latency",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "global-0259",
"title": "Why Kernel Fusion Matters",
"bloom": "remember"
},
{
"level": "L3",
"id": "global-0210",
"title": "Kernel Fusion Memory Bandwidth Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0240",
"title": "Optimizing a Memory-Bound Training Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0273",
"title": "Eager vs Compiled Execution for Inference",
"bloom": "evaluate"
}
],
"rationale": "A complete guide to kernel fusion, starting with foundational definitions, moving to exact memory bandwidth calculations, optimizing real training bottlenecks, and finally evaluating the business tradeoffs of graph compilation in production.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-57",
"track": "global",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "global-0168",
"title": "The PUE Illusion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0269",
"title": "A100 vs H100 Performance per Watt",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0289",
"title": "Power and Cooling for a 1000-GPU Cluster",
"bloom": "create"
}
],
"rationale": "Investigates massive-scale AI power efficiency, moving from understanding how workload types distort PUE metrics to maximizing TFLOPS within a fixed megawatt budget, and finally designing the full physical power/cooling architecture for a data center.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-58",
"track": "global",
"topic": "3d-parallelism",
"competency_area": "parallelism",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "global-0018",
"title": "Data Parallelism vs Model Parallelism",
"bloom": "understand"
},
{
"level": "L3",
"id": "global-0215",
"title": "Tensor Parallelism Communication Volume",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0242",
"title": "Choosing 3D Parallelism Configuration",
"bloom": "analyze"
}
],
"rationale": "A steady progression through distributed training design, from choosing basic splitting strategies to calculating specific tensor parallelism overheads, and culminating in architecting a full 3D configuration across NVLink and InfiniBand.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-59",
"track": "global",
"topic": "activation-memory",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "global-0208",
"title": "Activation Checkpointing Memory Savings",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0296",
"title": "OOM During Evaluation but Not Training",
"bloom": "analyze"
},
{
"level": "L5",
"id": "global-0270",
"title": "CPU Offloading vs Activation Recomputation",
"bloom": "evaluate"
}
],
"rationale": "Teaches the mechanics and tradeoffs of activation memory, moving from calculating baseline checkpointing savings to debugging eval-specific OOMs, and comparing recomputation against PCIe offloading.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-61",
"track": "global",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "global-0047",
"title": "The 3x Rule of Backpropagation",
"bloom": "understand"
},
{
"level": "L5",
"id": "global-0375",
"title": "Cross-Regime Compute Cost Spec",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0378",
"title": "Incomplete Compute Cost Procurement Decision",
"bloom": "create"
}
],
"rationale": "Develops intuition for the economics of ML compute, progressing from fundamental FLOP ratios in backprop to specifying cross-regime cost models and making bounded procurement decisions with incomplete data.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-62",
"track": "global",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "global-0223",
"title": "torch.compile Warm-Up vs Steady-State",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0237",
"title": "torch.compile Recompilation Storm",
"bloom": "analyze"
}
],
"rationale": "Explores the runtime dynamics of graph compilation, progressing from basic break-even calculations for warm-up costs to diagnosing and resolving catastrophic recompilation storms in production endpoints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-64",
"track": "global",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "global-0311",
"title": "Global New 0012",
"bloom": "apply"
},
{
"level": "L4",
"id": "global-0229",
"title": "Pipeline Parallelism Bubble Overhead Analysis",
"bloom": "analyze"
}
],
"rationale": "Examines the latencies associated with pipeline parallelism, moving from raw stage-to-stage NVLink transfer delays to analytically comparing the compute bubble overheads of different micro-batching depths.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-65",
"track": "global",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "global-0432",
"title": "Evaluating Continuous vs Fixed Batching Queues",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0435",
"title": "Heterogeneous Routing Algorithm for LLM Multi-Region Deployment",
"bloom": "create"
}
],
"rationale": "Applies queueing theory to LLM inference, progressing from evaluating batching stability under varying arrival rates to designing complex, multi-region load-shedding algorithms for unpredictable surges.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "global-chain-auto-secondary-017-66",
"track": "global",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "global-0196",
"title": "The GPAI Threshold Debate",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "global-0091",
"title": "EU AI Act Compliance Pipeline Storage Design",
"bloom": "create"
}
],
"rationale": "Analyzes the intersection of ML systems and regulatory law, moving from calculating compute thresholds that trigger regulation to architecting immutable data pipelines that satisfy conflicting legal requirements.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-002-01",
"track": "mobile",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-2071",
"title": "Pedometer Power Calculation",
"bloom": "apply"
},
{
"level": "L3",
"id": "mobile-1925",
"title": "Hardware FIFO Sensor Batching",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1918",
"title": "Snapdragon NPU Duty-Cycling",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1915",
"title": "Tiered Sensor Hub Data Batching",
"bloom": "create"
}
],
"rationale": "Progresses from basic power calculation of a duty cycle to designing a tiered sensor hub batching strategy on Snapdragon hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-002-02",
"track": "mobile",
"topic": "duty-cycling",
"competency_area": "power",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1895",
"title": "A17 Pro NPU Keyword Spotting",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-2034",
"title": "Analyzing Thermal and Power Impacts of Mobile Batching",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1929",
"title": "A17 NPU Wake Break-Even for Activity Recognition",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-2030",
"title": "Designing Hierarchical Wake-Up Pipelines for Mobile NPU",
"bloom": "create"
}
],
"rationale": "Explores the power dynamics of Apple's Neural Engine from basic wake-word duty cycles to designing hierarchical wake-up pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-002-13",
"track": "mobile",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1469",
"title": "MobileNet Depthwise Separable Convolution Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-0856",
"title": "Fluency: describe inverted residual block execution on Qualcomm Hexagon NPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0861",
"title": "Optimize MobileNetV2 expansion ratio for Snapdragon 8 Gen 3 Hexagon HTP memory",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1431",
"title": "Sizing Inverted Residuals for Hexagon NPU",
"bloom": "evaluate"
}
],
"rationale": "Focuses on optimizing inverted residual blocks specifically for the memory bandwidth and execution characteristics of the Qualcomm Hexagon NPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-002-14",
"track": "mobile",
"topic": "cnn-efficient-design",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-1602",
"title": "A17 Pro Neural Engine and Depthwise Separable Convolutions",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1338",
"title": "MobileNetV2 FLOPs vs Apple A17 Pro TOPS",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1604",
"title": "MobileNetV3 Latency Anomaly on Apple A17 Pro Neural Engine",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0860",
"title": "Mastery: MobileNetV3 SE block impact on A17 Pro ANE vs Hexagon NPU",
"bloom": "create"
}
],
"rationale": "Explores the nuances of CNN operations on the Apple A17 Pro Neural Engine, from basic utilization calculations to diagnosing severe latency anomalies related to specific block architectures like SE modules.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-003-01",
"track": "mobile",
"topic": "compound-ai-systems",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-0893",
"title": "Recalling Compound AI Pipeline Components and Their Roles on Mobile",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-0879",
"title": "Explaining Compound AI Pipeline Stages to a Mobile App Developer on A17 Pro",
"bloom": "understand"
},
{
"level": "L5",
"id": "mobile-0871",
"title": "Designing an On-Device RAG Architecture for A17 Pro with 8GB Memory",
"bloom": "create"
}
],
"rationale": "A progression exploring the foundational components of an on-device RAG system, why retrieval is necessary, and finally designing the system's memory architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-003-06",
"track": "mobile",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1350",
"title": "Calculate Max FPS After Operator Fusion on ANE",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1639",
"title": "A17 Pro ML Compiler Design for Real-time Vision",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1641",
"title": "Optimizing Vision Transformer for Apple A17 Pro Neural Engine",
"bloom": "analyze"
}
],
"rationale": "Explores compiler-driven graph optimizations on Apple's ANE, progressing from calculating max FPS after fusion to designing full AOT compilation strategies for dynamic transformers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-003-08",
"track": "mobile",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-1640",
"title": "Optimizing LLM Deployment on Snapdragon Hexagon NPU via Graph Compilation",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1636",
"title": "Hexagon NPU Graph Compilation Analysis: Latency & Memory for Large Models",
"bloom": "analyze"
}
],
"rationale": "A focused sequence on the Snapdragon Hexagon NPU examining operator lowering mechanisms and diagnosing failures in constant folding.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-004-06",
"track": "mobile",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L1",
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1470",
"title": "A17 Pro Neural Engine Specification Recall",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-1528",
"title": "A17 Pro Neural Engine Inference Throughput",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-2074",
"title": "Estimating Maximum Frame Rate on A17 Pro Neural Engine",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1432",
"title": "Estimating Real-Time Video Segmentation Compute on A17 Pro",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner from recalling A17 Pro specs to estimating max throughput, applying utilization factors for frame rates, and evaluating real-time thermal/compute budgets for video segmentation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-004-07",
"track": "mobile",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1306",
"title": "Tensor G3 Gemini Nano Prefill Latency",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1226",
"title": "Architecting On-Device LLM Cost Strategy for Tensor G3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1372",
"title": "On-Device LLM Sizing for Google Tensor G3",
"bloom": "create"
}
],
"rationale": "Progresses from basic prefill latency calculation for Tensor G3 to architecting a thermal/latency cost strategy, and finally sizing an on-device LLM to meet strict multi-stage latency targets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-004-08",
"track": "mobile",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1082",
"title": "Mobile NPU Inference Cost Analysis on Snapdragon 8 Gen 3",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1405",
"title": "Diagnosing NPU Compute Bottlenecks on Snapdragon 8 Gen 3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1531",
"title": "Snapdragon 8 Gen 3 NPU Inference Cost Optimization",
"bloom": "analyze"
}
],
"rationale": "Teaches the progression of analyzing Snapdragon 8 Gen 3 inference cost, diagnosing compute bottlenecks due to precision/shared memory, and estimating final FLOPs and energy costs for LLM deployment.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-004-09",
"track": "mobile",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1339",
"title": "Estimating Inference FPS on Exynos 2400 NPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1280",
"title": "Evaluate Vision Transformer Deployment Cost on Exynos 2400 NPU",
"bloom": "evaluate"
}
],
"rationale": "Moves from estimating Exynos 2400 inference FPS at a fixed utilization to evaluating and justifying compute cost tradeoffs between different vision architectures on the same hardware.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-005-13",
"track": "mobile",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1972",
"title": "A17 Pro Ring AllReduce Bounds",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-2130",
"title": "AirDrop Half-Duplex Sync",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1888",
"title": "FedAvg Wi-Fi Star vs Ring",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-2136",
"title": "BLE Gossip Quantization",
"bloom": "create"
}
],
"rationale": "Starts with calculating theoretical Ring AllReduce bounds for mobile devices, introduces half-duplex Wi-Fi sync constraints, evaluates Star vs Ring topologies for federated averaging, and designs an asynchronous quantized gossip protocol over BLE.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-005-14",
"track": "mobile",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-0761",
"title": "The Cellular Model Delivery Problem",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-0772",
"title": "The Model Distillation Sync Budget",
"bloom": "create"
}
],
"rationale": "Progresses from the basic challenge of delivering a large quantized model over cellular networks to designing a continuous distillation sync protocol strictly bound by a daily data budget.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-18",
"track": "mobile",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-1651",
"title": "MLOps Artifacts for On-Device AI with Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1204",
"title": "CI/CD to Tensor G3 Deployment Discrepancy",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1417",
"title": "CI/CD Hardware Fallback on Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Moves from versioning artifacts for the Tensor G3 to diagnosing production/CI discrepancies and mitigating hardware fallback latency on the TPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-19",
"track": "mobile",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1323",
"title": "CI/CD Compute Gating for Exynos NPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1655",
"title": "Optimizing Edge ML Deployment on Samsung Exynos NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1138",
"title": "Architecting On-Device CI/CD for Exynos 2400 NPU",
"bloom": "evaluate"
}
],
"rationale": "Explores CI/CD compute gating for the Exynos NPU, diagnoses bottlenecks missing the FPS target, and architects a full pipeline to validate shared-memory limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-20",
"track": "mobile",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1355",
"title": "CI/CD Latency Gating for A17 Pro Neural Engine",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1301",
"title": "CI/CD for On-Device Models on Apple A17 Pro Neural Engine",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1656",
"title": "Scalable MLOps for On-Device AR on Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Progresses from calculating latency gates for the A17 Pro Neural Engine to evaluating CI/CD proposals and scaling the full MLOps pipeline for on-device AR.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-21",
"track": "mobile",
"topic": "mlops-lifecycle",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1652",
"title": "MLOps Lifecycle for On-Device Deployment on Snapdragon Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1165",
"title": "CI/CD Deployment Fallback Regression",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1450",
"title": "CI/CD Release Gates for Hexagon NPU",
"bloom": "evaluate"
}
],
"rationale": "Starts with establishing MLOps metrics for the Hexagon NPU, diagnoses canary latency regressions, and designs CI/CD release gates focused on shared memory.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-22",
"track": "mobile",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L3",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1207",
"title": "Tensor G3 LLM Straggler Latency Analysis",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1392",
"title": "Edge Telemetry Architecture for On-Device LLMs",
"bloom": "create"
}
],
"rationale": "Diagnoses latency stragglers during LLM token generation on Tensor G3, culminating in designing a low-overhead edge telemetry architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-23",
"track": "mobile",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L2",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-1700",
"title": "Monitoring On-Device ML Health on Snapdragon 8 Gen 3 NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1168",
"title": "Diagnosing Shared Memory Contention on Hexagon NPU",
"bloom": "analyze"
}
],
"rationale": "Transitions from identifying key telemetry metrics on the Snapdragon 8 Gen 3 NPU to diagnosing severe latency stragglers caused by shared memory contention.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-24",
"track": "mobile",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1701",
"title": "Quantifying Straggler Impact on A17 Pro ML Inference Latency",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1453",
"title": "On-Device Telemetry Budgeting for A17 Pro Neural Engine",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1704",
"title": "Designing Reliable On-Device ML Monitoring for Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Quantifies the impact of stragglers on the A17 Pro, budgets telemetry to avoid draining battery, and designs a comprehensive monitoring architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-25",
"track": "mobile",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L1",
"L2",
"L4"
],
"questions": [
{
"level": "L1",
"id": "mobile-0143",
"title": "The Privacy Wall",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0153",
"title": "Federated Averaging's Blind Spot",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-0024",
"title": "The Silent Accuracy Degradation",
"bloom": "analyze"
}
],
"rationale": "Applies privacy-preserving techniques to edge telemetry, evaluates the blind spots of federated aggregation, and detects silent accuracy degradation without ground truth.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-31",
"track": "mobile",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1110",
"title": "Unstructured Sparsity Inefficiency on Apple A17 Pro",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1423",
"title": "Structured Pruning for Apple A17 Pro Neural Engine",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1629",
"title": "Optimizing a Large Language Model for On-Device Deployment on Apple A17 Pro with Structured Pruning",
"bloom": "analyze"
}
],
"rationale": "Explores why unstructured sparsity fails on the A17 Pro Neural Engine, moves to selecting structured replacements, and architects the deployment of a 7B LLM leveraging these optimizations.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-32",
"track": "mobile",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1628",
"title": "Optimizing LLM Deployment on Snapdragon Hexagon NPU with Structured Pruning",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1630",
"title": "Optimizing LLM Deployment on Snapdragon Hexagon NPU via Structured Pruning",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1145",
"title": "Architecting a Pruning Strategy for Hexagon NPU",
"bloom": "evaluate"
}
],
"rationale": "Starts by applying structured pruning to the Snapdragon Hexagon NPU, evaluates competing sparsity patterns, and designs an end-to-end architecture pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-33",
"track": "mobile",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1626",
"title": "Differentiating Pruning Techniques for Mobile ML Acceleration on Tensor G3",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1360",
"title": "Calculate 2:4 Structured Sparsity Speedup on Tensor G3",
"bloom": "apply"
},
{
"level": "L6+",
"id": "mobile-1631",
"title": "Optimizing Large Language Model Inference with Structured Sparsity on Google Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Differentiates pruning techniques for Tensor G3, calculates expected structured sparsity gains, and optimizes a large LLM focusing on latency and power tradeoffs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-006-34",
"track": "mobile",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L1",
"L4"
],
"questions": [
{
"level": "L1",
"id": "mobile-1486",
"title": "Unstructured vs Structured Pruning on Exynos 2400 NPU",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1627",
"title": "Pruning for On-Device LLM Inference on Samsung Exynos 2400 NPU",
"bloom": "analyze"
}
],
"rationale": "Distinguishes between unstructured and structured pruning fundamentals on the Exynos 2400 and scales to applying these alignments to LLM deployment.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-007-10",
"track": "mobile",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L1",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "mobile-1467",
"title": "Hexagon NPU INT8 Peak Performance Recall",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1526",
"title": "Optimizing Mobile LLMs: Snapdragon 8 Gen 3 NPU Capabilities",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1428",
"title": "Hexagon NPU Sizing for On-Device LLM",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1527",
"title": "Mobile AI Inference Sizing on Snapdragon NPU",
"bloom": "analyze"
}
],
"rationale": "Progresses from recalling Hexagon NPU peak capability to assessing its architectural tradeoffs for an LLM, sizing it for strict token SLAs, and empirically sizing inference beyond just peak TOPS.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-007-12",
"track": "mobile",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1336",
"title": "A17 Pro NPU Latency and Energy Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0722",
"title": "The ANE Delegation Disaster",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1299",
"title": "A17 Pro Neural Engine vs GPU for Real-Time Video",
"bloom": "evaluate"
}
],
"rationale": "Progresses from calculating theoretical NPU latency/energy benefits to diagnosing massive performance regressions across Apple generations, to architecting a heterogeneous NPU/GPU pipeline for custom ops.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-007-13",
"track": "mobile",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L1",
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L1",
"id": "mobile-0142",
"title": "The Fusion Tax",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0152",
"title": "The Operator Fusion Tax",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0416",
"title": "The Mobile GPU's Memory-Go-Round",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0831",
"title": "Mobile New 0033",
"bloom": "analyze"
}
],
"rationale": "Progresses from basic definition of operator fusion cost to calculating the latency saved by fusing Conv and ReLU, diagnosing why an unfused sequence tanks mobile performance, and explaining why it executes as distinct kernels.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-007-14",
"track": "mobile",
"topic": "kernel-fusion",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-0925",
"title": "Kernel Fusion: Recall \u2014 What is Operator Fusion on Neural Engines?",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-0929",
"title": "Kernel Fusion: Fluency \u2014 Estimate Neural Engine Throughput for Fused MLP",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-0928",
"title": "Kernel Fusion: Evaluate Quantized Attention Fusion on A17 Pro",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0932",
"title": "Kernel Fusion: Mastery \u2014 Optimize 3B LLM Decode Throughput on A17 Pro",
"bloom": "create"
}
],
"rationale": "Progresses from recalling A17 Pro fusion basics to estimating throughput for fused MLP blocks, evaluating INT8 vs FP16 attention fusion, and ultimately optimizing full decode throughput using advanced fusion strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-008-03",
"track": "mobile",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "mobile-0329",
"title": "The Energy Cost of Privacy",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0247",
"title": "The Privacy vs. Battery-Life Tax",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0570",
"title": "The Drowsiness Detection TCO Dilemma",
"bloom": "apply"
}
],
"rationale": "Evaluates the fundamental energy and TCO tradeoffs of on-device federated learning versus centralized cloud data collection.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-008-05",
"track": "mobile",
"topic": "differential-privacy",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1744",
"title": "DP-SGD and Privacy Budgeting on Snapdragon NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1749",
"title": "Optimizing DP-SGD on Snapdragon 8 Gen 3 Hexagon NPU for Mobile Privacy",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1747",
"title": "On-Device DP-SGD for Federated Learning on Snapdragon 8 Gen 3 Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1377",
"title": "On-Device DP-SGD on Hexagon NPU",
"bloom": "create"
}
],
"rationale": "A hardware-specific progression covering DP-SGD fundamentals, optimization, noise calibration, and full architecture on Hexagon NPUs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-009-01",
"track": "mobile",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1767",
"title": "Token-Level Safety Classifier Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1152",
"title": "Diagnosing High Latency in On-Device Gemini Nano Sanitization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1429",
"title": "On-device LLM Guardrail Sizing for Tensor G3",
"bloom": "evaluate"
}
],
"rationale": "Explores the performance impact, latency diagnosis, and architectural tradeoffs of deploying a safety classifier alongside Gemini Nano on the Tensor G3 to defend against prompt injections.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-009-02",
"track": "mobile",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L2",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-1691",
"title": "A17 Pro NPU Adversarial Attack Detection",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1694",
"title": "Diagnosing Adversarial Impact on On-Device ML Reliability with Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Progresses from identifying an adversarial attack on an A17 Pro fraud detection model to diagnosing a specific side-channel vulnerability under high system load.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-009-03",
"track": "mobile",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-1561",
"title": "Diagnosing ML Model Latency on Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1359",
"title": "Calculate Prefill Latency Bound on Tensor G3",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1294",
"title": "On-Device LLM Profiling Alternatives for Tensor G3",
"bloom": "evaluate"
}
],
"rationale": "Guides the learner from basic bottleneck diagnosis on Tensor G3, through calculating theoretical prefill compute bounds, to evaluating compute versus memory bandwidth limits during autoregressive generation for Gemini Nano.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-009-04",
"track": "mobile",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1327",
"title": "Hexagon NPU Profiling Analysis",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1144",
"title": "Architecting Heterogeneous Profiling for NPU Latency",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1396",
"title": "Heterogeneous Pipeline Profiling on Snapdragon 8 Gen 3",
"bloom": "create"
}
],
"rationale": "Progresses from analyzing theoretical latency of a single model on the Hexagon NPU to architecting a heterogeneous profiling system, and finally resolving complex cross-subsystem bottlenecks in a real-time AR pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-009-05",
"track": "mobile",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1485",
"title": "Exynos 2400 NPU Shared Memory Architecture",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1170",
"title": "Diagnosing Latency Spikes on Exynos 2400",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1565",
"title": "Real-time Semantic Segmentation on Exynos 2400: Latency Bottleneck Design",
"bloom": "analyze"
}
],
"rationale": "Starts with identifying architectural monitoring metrics on the Exynos 2400, moves to diagnosing latency spikes in a vision model, and concludes with designing a preemptive bottleneck profiling strategy for a semantic segmentation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-010-09",
"track": "mobile",
"topic": "attention-scaling",
"competency_area": "architecture",
"levels": [
"L1",
"L2",
"L3"
],
"questions": [
{
"level": "L1",
"id": "mobile-0258",
"title": "The Depthwise Separable Efficiency Gain",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0170",
"title": "The Illusion of Symmetric Scaling",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-0425",
"title": "The NPU Architecture Dilemma: CNN vs. ViT",
"bloom": "apply"
}
],
"rationale": "Explores efficiency optimizations in mobile vision models, advancing from depthwise separable convolution basics to scaling illusions and resolving NPU architecture dilemmas between CNNs and ViTs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-010-10",
"track": "mobile",
"topic": "attention-scaling",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1080",
"title": "Analyzing GQA vs MHA Memory Bandwidth on Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1493",
"title": "Designing Mobile Attention for Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1605",
"title": "Optimizing Large Context Attention for Mobile NPU",
"bloom": "analyze"
}
],
"rationale": "Investigates the impact of attention mechanisms on mobile NPUs, moving from analyzing MHA versus GQA memory bandwidth bottlenecks to designing and evaluating large-context attention architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-010-12",
"track": "mobile",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-0481",
"title": "The Mobile Jank Detective",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1615",
"title": "A17 Pro Hardware-Aware NAS Performance Analysis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1302",
"title": "Hardware-Aware NAS for A17 Pro Neural Engine",
"bloom": "evaluate"
}
],
"rationale": "Analyzes UI jank caused by vision models on mobile, progressing from diagnosing latency spikes with ViTs to analyzing hardware-aware NAS performance and comparing search space proposals for A17 Pro.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-010-13",
"track": "mobile",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1357",
"title": "Hardware-Aware NAS Latency on Hexagon NPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1454",
"title": "Hardware-Aware NAS for Hexagon NPU Realization",
"bloom": "evaluate"
}
],
"rationale": "Explores hardware-aware NAS for the Hexagon NPU, moving from estimating compute latency to designing a NAS search space that avoids memory bandwidth limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-011-10",
"track": "mobile",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1663",
"title": "Tensor G3 Model Rollout: Choosing a Strategy",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1256",
"title": "Diagnosing OOM in On-Device LLM Shadow Rollout",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1427",
"title": "Progressive Rollout of Gemini Nano A/B Experiment",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1668",
"title": "On-Device ML Model Canary Rollout on Google Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Guides the learner through safely deploying a model on Tensor G3, starting with strategy selection, moving to diagnosing shadow deployment OOMs, designing the rollout, and managing full canary metrics.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-011-11",
"track": "mobile",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1188",
"title": "Analyzing Shadow Deployment OOM on Exynos 2400",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1402",
"title": "A/B Rollout Memory Bottleneck on Exynos 2400",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1666",
"title": "Designing a Phased Rollout for Edge ML on Samsung Exynos 2400 NPU",
"bloom": "analyze"
}
],
"rationale": "Progresses through memory bottleneck analysis on Exynos 2400, from identifying the OOM to diagnosing the A/B test failure, and culminating in designing a safe phased rollout.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-011-12",
"track": "mobile",
"topic": "ab-rollout-strategies",
"competency_area": "deployment",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1303",
"title": "A17 Pro Canary Rollout Performance Budget",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1222",
"title": "Progressive Rollout Design for A17 Pro Neural Engine",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1670",
"title": "ML Model Rollout on Apple A17 Pro with Progressive Deployment",
"bloom": "analyze"
}
],
"rationale": "Explores progressive rollout strategies on the Apple A17 Pro, from basic performance budgeting to managing thermal-heavy models and defining a comprehensive progressive deployment plan.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-011-13",
"track": "mobile",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1760",
"title": "On-Device Responsible AI Guardrail Overhead on Snapdragon NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1274",
"title": "On-Device PII Guardrail Starvation on Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1459",
"title": "Sizing On-Device Guardrails on Hexagon NPU",
"bloom": "evaluate"
}
],
"rationale": "Explores the resource constraints of running guardrails on Hexagon NPU, moving from basic overhead calculation to diagnosing starvation and properly sizing the concurrent architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-011-14",
"track": "mobile",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1329",
"title": "On-Device Guardrail Latency and Memory",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1765",
"title": "Optimizing Responsible AI Guardrails on Samsung Exynos NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1146",
"title": "Architecting On-Device Guardrails for Generative Text",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1397",
"title": "On-Device Guardrail Architecture for Exynos",
"bloom": "create"
}
],
"rationale": "Progresses from sizing a guardrail on Exynos 2400 to optimizing its latency, architecting concurrent generation and safety, and finally designing the holistic ISP-aware safety architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-011-15",
"track": "mobile",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1113",
"title": "Guardrail Latency on Tensor G3 TPU",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1761",
"title": "Bias Detection in On-Device LLM on Tensor G3: Diagnosing a Fairness Regression",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1764",
"title": "Realizing Responsible AI on Google Tensor G3: On-Device Content Moderation",
"bloom": "analyze"
}
],
"rationale": "Covers the lifecycle of a Tensor G3 LLM guardrail, from debugging latency misses to diagnosing fairness regressions, culminating in designing a complete on-device moderation system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-01",
"track": "mobile",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1309",
"title": "On-Device Data Quality Gate Compute Budget",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1711",
"title": "Data Corruption in Edge ML Model on Samsung Exynos 2400 NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1125",
"title": "Real-time Video Quality Gating on Exynos 2400",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1375",
"title": "On-Device Data Validation for Continuous Learning",
"bloom": "create"
}
],
"rationale": "Progresses from calculating compute budgets for Exynos 2400 camera streams to diagnosing corruption, designing the gating architecture, and finally architecting a multi-stage continuous learning validation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-02",
"track": "mobile",
"topic": "data-quality-validation",
"competency_area": "data",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1472",
"title": "Hexagon NPU Peak Throughput Data Type",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1771",
"title": "Data Validation Gate Memory Bottleneck",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1155",
"title": "NPU Fallback from Data Contract Violations",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1434",
"title": "On-Device Data Validation Pipeline on Snapdragon 8 Gen 3",
"bloom": "evaluate"
}
],
"rationale": "Explores data type contracts on Hexagon NPU, moving from basic data types to diagnosing memory/latency bottlenecks of contract violations, culminating in an architectural CPU/NPU split design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-03",
"track": "mobile",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1477",
"title": "A17 Pro Unified Memory Graceful Degradation",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1160",
"title": "Neural Engine OOM During Model Degradation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1443",
"title": "Thermal Degradation for Video Segmentation on A17 Pro",
"bloom": "evaluate"
}
],
"rationale": "Explores A17 Pro degradation, starting with unified memory capacity, diagnosing out-of-memory crashes during thermal fallbacks, and designing a robust video segmentation degradation ladder.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-04",
"track": "mobile",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1679",
"title": "Adaptive NLU on Tensor G3: Resource-Aware Graceful Degradation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1132",
"title": "On-Device LLM Degradation Ladder for Tensor G3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1384",
"title": "Tensor G3 On-Device LLM Degradation Strategy",
"bloom": "create"
}
],
"rationale": "Builds a complete degradation arc for Tensor G3 LLMs, starting with NLU resource constraints, progressing to fail-operational translation designs, and culminating in a comprehensive thermal/RAM fallback architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-05",
"track": "mobile",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L1",
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1482",
"title": "Google Tensor G3 TPU 16-bit Format Recall",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-1575",
"title": "Google Tensor G3: Mixed-Precision Strategy for On-Device LLM Inference",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1164",
"title": "Diagnosing NaN Outputs in On-Device LLM",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1449",
"title": "On-Device LLM Precision Strategy for Tensor G3",
"bloom": "evaluate"
}
],
"rationale": "Covers mixed-precision on Tensor G3, starting with fundamental hardware formats, exploring strategy trade-offs, diagnosing numerical instability (NaNs), and designing a holistic deployment strategy under strict memory limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-06",
"track": "mobile",
"topic": "mixed-precision-training",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1322",
"title": "FP16 Memory and Compute on A17 Pro",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1183",
"title": "A17 Pro Mixed-Precision Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1576",
"title": "Optimizing Large Language Model Inference with Mixed-Precision on Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1390",
"title": "On-Device FP8 Inference Design for LLM on A17 Pro",
"bloom": "create"
}
],
"rationale": "Explores LLM precision on Apple A17 Pro, starting with FP16 memory math, diagnosing speed bottlenecks, optimizing the mixed-precision plan, and ultimately designing an advanced FP8 inference architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-07",
"track": "mobile",
"topic": "operator-scheduling",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-1642",
"title": "Operator Scheduling: Layer Fusion on Samsung Exynos 2400 NPU",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1107",
"title": "Dual-Core NPU Scheduling Bottleneck Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1421",
"title": "Dual-Core NPU Operator Scheduling for Memory Contention Mitigation",
"bloom": "analyze"
}
],
"rationale": "Focuses on Exynos 2400 dual-core scheduling, moving from basic layer fusion to analyzing counter-intuitive parallel bottlenecks, and finally mitigating memory contention across cores.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-08",
"track": "mobile",
"topic": "operator-scheduling",
"competency_area": "optimization",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1648",
"title": "LLM Operator Scheduling on Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1293",
"title": "Heterogeneous Pipelining on Snapdragon 8 Gen 3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1650",
"title": "Hexagon NPU Transformer Scheduling for LLMs",
"bloom": "analyze"
}
],
"rationale": "Addresses scheduling on Hexagon NPU, progressing from basic LLM block scheduling to heterogeneous pipelining, and concluding with advanced transformer scheduling to minimize memory traffic.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-09",
"track": "mobile",
"topic": "safety-certification",
"competency_area": "reliability",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1488",
"title": "Hardware Safety Mechanism Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1689",
"title": "Functional Safety for Autonomous Driving on Exynos 2400",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1275",
"title": "Diagnosing Exynos NPU Latency Spikes",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1461",
"title": "Deterministic Driver Monitoring on Exynos 2400",
"bloom": "evaluate"
}
],
"rationale": "Follows safety mechanisms on Exynos 2400, starting from identifying watchdogs, calculating ASIL D requirements, diagnosing latency spikes triggering watchdogs, and architecting deterministic driver monitoring.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-10",
"track": "mobile",
"topic": "safety-certification",
"competency_area": "reliability",
"levels": [
"L2",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1684",
"title": "NPU Functional Safety for ISO 26262 ASIL B",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1330",
"title": "Watchdog Timer Sizing for Hexagon NPU",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1147",
"title": "Architecting a Fail-Safe ADAS Monitor on Hexagon NPU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1690",
"title": "Functional Safety Design for ADAS on Snapdragon 8 Gen 3 Hexagon NPU",
"bloom": "analyze"
}
],
"rationale": "Explores functional safety on Hexagon NPU, progressing from basic ASIL B co-design and watchdog sizing to architecting fail-safe ADAS monitors and complete ASIL-B deterministic software architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-11",
"track": "mobile",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-1726",
"title": "Real-time Sensor Stream Processing on Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1218",
"title": "A17 Pro Shared Memory Streaming Bottleneck",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1426",
"title": "Optimizing High-Frequency Sensor Ingestion on A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Examines streaming ingestion on A17 Pro, starting with low-power high-frequency processing, analyzing CPU-copying bottlenecks, and optimizing IMU ingestion to unblock the GPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-12",
"track": "mobile",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1732",
"title": "On-Device Sensor Stream Processing with Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1728",
"title": "Real-time Physiological Anomaly Detection on Google Tensor G3",
"bloom": "create"
},
{
"level": "L6+",
"id": "mobile-1734",
"title": "Real-time Gesture Recognition on Google Tensor G3",
"bloom": "create"
}
],
"rationale": "Focuses on Tensor G3 stream processing, from basic pipeline design for 1000Hz data to balancing power efficiency for anomalies, and finally architecting a sub-50ms gesture recognition system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-13",
"track": "mobile",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1537",
"title": "VRAM Budgeting for 7B LLM Inference on Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1520",
"title": "LLM Memory Budget Specification for A17 Pro",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1534",
"title": "Optimizing LLM Memory Footprint on Apple A17 Pro for On-Device Inference",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1539",
"title": "On-Device LLM Deployment: Apple A17 Pro VRAM Budgeting for Inference",
"bloom": "analyze"
}
],
"rationale": "Systematically builds VRAM budgeting skills for Apple A17 Pro, starting with basic 7B LLM budgeting, explicitly specifying breakdowns, optimizing footprint under constraints, and making strategic memory-performance trade-offs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-012-14",
"track": "mobile",
"topic": "vram-budgeting",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1533",
"title": "Google Tensor G3 VRAM Budget Components",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1535",
"title": "Diagnosing OOM on Tensor G3: VRAM Budgeting for Large Models",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1538",
"title": "VRAM Budgeting for On-Device LLM Inference on Google Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Covers VRAM budgeting on Tensor G3, starting with component identification, diagnosing 4096-token OOM errors, and estimating precise budgets for bfloat16 deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-06",
"track": "mobile",
"topic": "distribution-drift-detection",
"competency_area": "reliability",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1474",
"title": "Hexagon NPU Capacity for Drift Detection",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1671",
"title": "On-device PSI Calculation for Predicted Class Drift on Snapdragon NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1261",
"title": "Diagnosing On-Device Drift Detection OOM on Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1674",
"title": "Drift Detection Strategies for Mobile NPU Deployments",
"bloom": "analyze"
}
],
"rationale": "Follows the implementation of PSI-based drift detection on the Snapdragon Hexagon NPU, starting from capacity constraints, implementing the calculation, diagnosing OOM crashes during background execution, and evaluating hybrid strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-07",
"track": "mobile",
"topic": "distribution-drift-detection",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1311",
"title": "On-Device KL Divergence Calculation",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1675",
"title": "Exynos NPU Drift: Optimizing On-Device Reliability",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1127",
"title": "On-Device Drift Detection for Exynos 2400",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1378",
"title": "On-Device ISP Concept Drift Detection",
"bloom": "create"
}
],
"rationale": "Explores drift detection on the Exynos 2400 NPU camera pipeline, moving from calculating KL divergence, diagnosing false negatives, designing a system that preserves ISP bandwidth, and architecting ISP concept drift detection at 4K 60 FPS.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-08",
"track": "mobile",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L2",
"L3",
"L4"
],
"questions": [
{
"level": "L2",
"id": "mobile-1619",
"title": "Snapdragon 8 Gen 3: Encoder-Decoder Architecture Tradeoffs for On-Device AI",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1191",
"title": "Encoder-Decoder Memory Bandwidth Advantage",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1410",
"title": "Optimizing Encoder-Decoder on Snapdragon NPU",
"bloom": "analyze"
}
],
"rationale": "Builds understanding of encoder-decoder tradeoffs on Snapdragon processors, starting from high-level tradeoffs, analyzing latency differences empirically, and optimizing the decoding phase for memory limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-09",
"track": "mobile",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1475",
"title": "Encoder vs Decoder Bottlenecks on A17 Pro",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1621",
"title": "On-Device Real-time Translation with Apple A17 Pro: Architecture Tradeoffs",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1158",
"title": "Low Compute Utilization During On-Device Decoding",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1438",
"title": "Sizing On-Device Translation Architectures for A17 Pro",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the Apple A17 Pro unified memory architecture, moving from defining compute vs memory-bound phases, to analyzing translation tradeoffs, diagnosing low compute utilization, and sizing the architecture to fit memory-bandwidth limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-10",
"track": "mobile",
"topic": "encoder-decoder-tradeoffs",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1312",
"title": "Encoder vs Decoder Prefill Compute on Tensor G3",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1625",
"title": "Optimizing Encoder-Decoder Architectures for Google Tensor G3 On-Device NLU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1232",
"title": "Architecting Real-Time Translation on Tensor G3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1379",
"title": "On-Device Real-Time Translation Architecture for Tensor G3",
"bloom": "create"
}
],
"rationale": "Progresses through designing an NLU/translation pipeline on the Tensor G3, from calculating prefill compute latency, to diagnosing bottlenecks, evaluating offline design, and architecting an end-to-end continuous translation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-11",
"track": "mobile",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1313",
"title": "Energy Cost of Memory vs Compute on Exynos 2400",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1593",
"title": "Energy Optimization on Samsung Exynos 2400 NPU: Memory vs. Compute Costs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1233",
"title": "Architecting an Energy-Efficient Real-time Translation Pipeline",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1380",
"title": "Energy-Aware Wake Vision Architecture on Exynos 2400",
"bloom": "create"
}
],
"rationale": "Progresses from calculating the exact per-inference energy cost of memory vs compute on Exynos 2400, to optimizing NPU energy, designing an energy-efficient translation pipeline, and architecting a strict 50 mW continuous vision system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-12",
"track": "mobile",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1346",
"title": "Calculate NPU Compute and Memory Energy Per Inference",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1285",
"title": "Evaluating DRAM vs Compute Energy for A17 Pro",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1597",
"title": "Optimizing On-Device ML Energy Consumption on Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Starts with calculating energy cost of NPU compute versus unified-memory fetches on the A17 Pro, evaluates architecture choices based on DRAM spilling, and culminates in optimizing an entire real-time vision model using Horowitz principles.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-13",
"track": "mobile",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-1192",
"title": "Energy Cost of Memory vs Compute on Tensor G3",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1503",
"title": "Energy-Aware Memory Access Design on Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Explores the energy tradeoff between compute precision and memory access for on-device LLMs, starting with analyzing why an INT8 SRAM model saves battery over an FP16 DRAM model, and moving to designing an energy-aware inference specification.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-14",
"track": "mobile",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1193",
"title": "3-bit vs 4-bit Unpacking Overhead on Tensor G3",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1411",
"title": "Optimizing Sub-4-bit LLM Deployment on Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1582",
"title": "Sub-4-bit LLM Deployment on Google Tensor G3",
"bloom": "analyze"
}
],
"rationale": "Explores the tradeoffs of sub-4-bit quantization on Tensor G3, starting from analyzing the unpacking overhead of 3-bit vs 4-bit, optimizing memory-bound decoding, and finally compressing a 14GB model below 4 bits without losing accuracy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-15",
"track": "mobile",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1314",
"title": "3-bit AWQ Footprint on Exynos 2400",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1130",
"title": "Architecting a Sub-4-bit LLM Pipeline for Samsung Exynos 2400",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1381",
"title": "Sub-3-bit LLM Deployment on Exynos NPU",
"bloom": "create"
}
],
"rationale": "Traces extreme quantization on the Exynos 2400, beginning with calculating the footprint for 3-bit AWQ, making architectural choices between GPTQ and AWQ, and deploying an 8B assistant within a highly constrained 3.5GB budget.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-16",
"track": "mobile",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1476",
"title": "Definition of Equalized Odds for On-Device Models",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1159",
"title": "Diagnosing LLM Latency Bias on Tensor G3",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1441",
"title": "On-Device LLM Intersectional Fairness Sizing",
"bloom": "evaluate"
}
],
"rationale": "Begins with defining equalized odds, progresses to diagnosing a real-world hardware-induced latency bias on Tensor G3, and concludes with sizing an overnight intersectional fairness evaluation pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-17",
"track": "mobile",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1751",
"title": "On-Device Fairness Evaluation for Image Classification on Apple A17 Pro",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1235",
"title": "On-Device Fairness Evaluation Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1382",
"title": "On-Device Intersectional Fairness",
"bloom": "create"
}
],
"rationale": "Focuses on evaluating demographic parity directly on the Apple A17 Pro, starting with resource considerations, moving to designing an architecture within strict constraints, and finally architecting a 16-subgroup intersectional pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-18",
"track": "mobile",
"topic": "fairness-evaluation",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1194",
"title": "Quantization Bias Under Memory Contention",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1752",
"title": "Diagnosing Bias in NPU-Accelerated Facial Verification for Mobile",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1755",
"title": "On-Device Fairness Evaluation Architecture for Facial Recognition on Snapdragon Hexagon NPU",
"bloom": "analyze"
}
],
"rationale": "Explores how hardware constraints introduce bias, starting with the impact of static quantization fallback, moving to diagnosing false rejection rates, and designing an architecture to continuously monitor demographic parity on a Snapdragon NPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-19",
"track": "mobile",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-0394",
"title": "The TCO Blindspot: Training vs. Inference",
"bloom": "remember"
},
{
"level": "L2",
"id": "mobile-0289",
"title": "The Battery Drain Tax",
"bloom": "understand"
},
{
"level": "L3",
"id": "mobile-1042",
"title": "Mobile TCO Fluency: Quick Battery Cost Estimation for Mobile ML",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1043",
"title": "Mobile TCO Implement: Calculate Cost Per Inference for Mobile App",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1046",
"title": "Mobile TCO Optimization: Reduce Battery Drain for Intensive Mobile ML",
"bloom": "evaluate"
}
],
"rationale": "Progresses from the core concept of inference vs training energy costs, to comparing battery drain models, estimating annual energy costs, calculating exact cost per inference, and quantifying battery optimization impact for 1M active users.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-20",
"track": "mobile",
"topic": "tco-cost-modeling",
"competency_area": "deployment",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1049",
"title": "Mobile TCO Recall: Mobile ML Hardware Cost Tiers",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1037",
"title": "Mobile TCO Analyze: A17 Pro vs Cloud for On-Device Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1039",
"title": "Mobile TCO Design: On-Device vs Hybrid Inference Cost for Consumer App",
"bloom": "create"
},
{
"level": "L6+",
"id": "mobile-1044",
"title": "Mobile TCO Mastery: Enterprise Mobile AI Strategy Full Cost Model",
"bloom": "create"
}
],
"rationale": "Starts with recalling hardware cost tiers and capacities, progresses to comparing battery/cloud economics for an individual user, optimizes a hybrid cloud cutoff, and culminates in a comprehensive 3-year enterprise cost model.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-21",
"track": "mobile",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1490",
"title": "Google Tensor G3 TPU Peak Performance Recall",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1277",
"title": "Diagnosing TPU Throttling During Sustained Gemini Nano Generation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1463",
"title": "Sustained Thermal Budgeting for Continuous LLMs",
"bloom": "evaluate"
}
],
"rationale": "Follows the thermal realities of deploying Gemini Nano on Tensor G3, starting from peak TOPS specifications, to diagnosing a 60% speed drop caused by TPU throttling, and designing a sustained scheduling strategy for live translation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-22",
"track": "mobile",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1589",
"title": "Thermal Management and Sustained Performance of Samsung Exynos 2400 NPU",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-1219",
"title": "Analyzing Sustained vs Burst Performance on Exynos 2400",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1591",
"title": "Diagnosing Sustained Performance Degradation on Exynos 2400 NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1117",
"title": "Thermal Throttling in Shared NPU/ISP Pipelines",
"bloom": "apply"
},
{
"level": "L6+",
"id": "mobile-1592",
"title": "Samsung Exynos 2400 NPU Thermal Constraints for Sustained ML Inference",
"bloom": "analyze"
}
],
"rationale": "Explores the thermal impact of sustained vision processing on Exynos 2400, analyzing sudden framerate drops, diagnosing root causes, architecting a shared NPU/ISP pipeline to fit thermal headroom, and handling continuous constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-013-23",
"track": "mobile",
"topic": "thermal-management",
"competency_area": "power",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1331",
"title": "Calculate sustained FPS under A17 Pro thermal throttling",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1788",
"title": "Thermal Aware Inference Scheduling on iPhone During Photo Processing",
"bloom": "understand"
},
{
"level": "L5",
"id": "mobile-1149",
"title": "Architecting Thermal-Aware Sustained Video Processing on A17 Pro",
"bloom": "evaluate"
}
],
"rationale": "Traces the management of the A17 Pro Neural Engine's thermal budget, starting with calculating sustained FPS under limits, scheduling ML work during concurrent 4K30 video capture, and architecting a sustained super-resolution pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-05",
"track": "mobile",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-2145",
"title": "Mobile mmap: Loading Models from Flash Storage",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1415",
"title": "Zero-Copy Memory Mapping for Hexagon NPU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1300",
"title": "Mmap Strategies for Shared Memory",
"bloom": "evaluate"
}
],
"rationale": "Progresses from observing flash-to-RAM mmap loading delays to optimizing away initialization spikes and selecting advanced pinning strategies for shared memory.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-06",
"track": "mobile",
"topic": "memory-mapped-inference",
"competency_area": "memory",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1806",
"title": "Implementing Shared mmap for Edge LLMs",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1805",
"title": "Shared Mmap vs Heap Allocation for Multi-Process Edge LLM",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1241",
"title": "Zero-Copy Memory Mapping for Gemini Nano on Tensor G3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1388",
"title": "Zero-Copy LLM Architecture on Google Tensor G3",
"bloom": "create"
}
],
"rationale": "Guides the learner through zero-copy mmap implementation, evaluating heap vs mmap tradeoffs, and architecting robust system-level weight sharing on mobile SoCs.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-17",
"track": "mobile",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1307",
"title": "On-Device Coreset Sizing for A17 Pro",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1227",
"title": "On-Device Coreset Selection Architecture for Apple A17 Pro",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1738",
"title": "Optimizing On-Device ML with Coreset Selection on Apple A17 Pro",
"bloom": "analyze"
}
],
"rationale": "Advances from sizing coresets for the A17 Pro Neural Engine to architecting full on-device data selection pipelines under shared memory constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-18",
"track": "mobile",
"topic": "data-efficiency-selection",
"competency_area": "data",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1471",
"title": "Define Coresets for On-Device Fine-Tuning",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1769",
"title": "On-Device Coreset Selection for Gemini Nano",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1154",
"title": "On-Device Fine-Tuning Battery Drain and Model Collapse",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1433",
"title": "On-Device LLM Coreset Sizing for Tensor G3",
"bloom": "evaluate"
}
],
"rationale": "Progresses from defining coresets for Tensor G3 to evaluating gradient-based vs random pruning, diagnosing model collapse, and sizing nightly fine-tuning budgets.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-19",
"track": "mobile",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1634",
"title": "Optimizing Large Language Models for Apple A17 Pro with Knowledge Distillation",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1161",
"title": "Diagnosing High Latency in Feature-Distilled Models",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1239",
"title": "A17 Pro NPU Distillation Pipeline Design",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1385",
"title": "A17 Pro Asymmetric Distillation for ASR",
"bloom": "create"
}
],
"rationale": "Guides the deployment of distilled models on Apple A17 Pro, from initial memory limits to diagnosing feature distillation latency and designing asymmetric ASR pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-20",
"track": "mobile",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1478",
"title": "Recall Knowledge Distillation Basics for Tensor G3",
"bloom": "remember"
},
{
"level": "L4",
"id": "mobile-1265",
"title": "Diagnosing Slow Distilled LLM Generation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1445",
"title": "Distilling LLMs for Tensor G3 TPU Deployment",
"bloom": "evaluate"
}
],
"rationale": "Covers Tensor G3 distillation, starting with basic definitions, debugging slow autoregressive generation, and structuring hardware-friendly student architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-21",
"track": "mobile",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-1096",
"title": "Distilled vs Pruned Memory Bandwidth",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1506",
"title": "Exynos 2400 NPU Distillation Specification",
"bloom": "analyze"
}
],
"rationale": "Focuses on Exynos 2400 NPU, analyzing distilled vs pruned memory bandwidth before specifying full distillation pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-22",
"track": "mobile",
"topic": "memory-pressure-management",
"competency_area": "memory",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "mobile-1481",
"title": "Exynos 2400 Unified Memory Capacity Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "mobile-1551",
"title": "Memory Optimization for 7B LLM Inference on Samsung Exynos 2400 NPU",
"bloom": "analyze"
},
{
"level": "L4",
"id": "mobile-1163",
"title": "Camera App Transition OOM on Exynos 2400",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1552",
"title": "Optimizing Large Generative AI Model Deployment on Samsung Exynos 2400 NPU under Memory Constraints",
"bloom": "analyze"
}
],
"rationale": "Follows Exynos 2400 shared memory deployment, from recalling capacity to calculating minimum bits, diagnosing app-switching OOMs, and evaluating host offloading.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-24",
"track": "mobile",
"topic": "memory-pressure-management",
"competency_area": "memory",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1353",
"title": "Calculate Maximum Context Length for Gemini Nano",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1550",
"title": "Designing for Memory Pressure on Google Tensor G3",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-1553",
"title": "Optimizing Large Language Models on Google Tensor G3 for Mobile",
"bloom": "analyze"
}
],
"rationale": "Addresses Tensor G3 memory constraints, starting with context length calculations to optimizing LLMs for dynamic OS-level memory pressure.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-25",
"track": "mobile",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-1882",
"title": "A17 NPU Deterministic Queue",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-2032",
"title": "Analyzing Mobile Thermal Throttling using D/D/1 Queues",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-2099",
"title": "AR Throttling Queue Control",
"bloom": "create"
}
],
"rationale": "Progresses from modeling deterministic NPU queues for AR frames to analyzing thermal throttling bursts and ultimately designing dynamic queue control algorithms.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-26",
"track": "mobile",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "mobile-1935",
"title": "Explain backpressure effects on queue arrival rates during mobile bursts",
"bloom": "understand"
},
{
"level": "L5",
"id": "mobile-1967",
"title": "Voice Assistant Bursty Queue",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1954",
"title": "Voice Translation Queue Spike",
"bloom": "create"
}
],
"rationale": "Explores bursty mobile arrivals, starting with backpressure effects, evaluating tail latency spikes under bursty vs uniform arrivals, and diagnosing massive latency lags.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-014-27",
"track": "mobile",
"topic": "queueing-theory",
"competency_area": "latency",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "mobile-1904",
"title": "TTS Chunk Queue Wait",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1893",
"title": "Mobile AR Frame Queuing",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-1870",
"title": "A17 Pro NPU Video Frame Drops",
"bloom": "create"
}
],
"rationale": "Applies M/M/1 queuing models to mobile NPUs, moving from expected wait times for TTS to validating AR delay SLAs and defining finite-queue blocking probabilities.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-016-15",
"track": "mobile",
"topic": "autograd-computational-graphs",
"competency_area": "optimization",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-1834",
"title": "Minimizing computational graph size for on-device fine-tuning on Snapdragon 8 Gen 3",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "mobile-1835",
"title": "Implementing memory-efficient backprop for on-device RL on A17 Pro",
"bloom": "create"
},
{
"level": "L6+",
"id": "mobile-1842",
"title": "Designing a two-phase autograd pipeline for on-device neural style transfer on A17 Pro",
"bloom": "create"
}
],
"rationale": "Examines advanced on-device autograd limits, progressing from shrinking graph sizes for LoRA, implementing memory-efficient backprop for PPO, to designing complex two-phase autograd structures for NST.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-016-16",
"track": "mobile",
"topic": "chiplet-architecture",
"competency_area": "compute",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-1843",
"title": "Chiplet Compute-to-Memory Ratio for Mobile SoC Design",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1858",
"title": "Unified Memory Coherency for Mobile Chiplet Camera Pipeline",
"bloom": "analyze"
}
],
"rationale": "Investigates mobile chiplet integration, beginning with understanding compute-to-memory ratios and then diagnosing unified-memory coherency stalls.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-016-17",
"track": "mobile",
"topic": "communication-computation-overlap",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "mobile-2070",
"title": "Neural Engine and GPU Overlap",
"bloom": "analyze"
},
{
"level": "L3",
"id": "mobile-2077",
"title": "Unified Memory Contention Between NPU and GPU Execution",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1933",
"title": "Evaluate shared memory bandwidth contention during mobile task overlap",
"bloom": "evaluate"
}
],
"rationale": "Explores mobile SoC pipeline overlapping, starting with calculating theoretical overlaps, analyzing basic memory contention, and evaluating extreme shared bandwidth contention during AR execution.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-016-18",
"track": "mobile",
"topic": "model-adaptation-systems",
"competency_area": "architecture",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-1863",
"title": "Per-User LoRA Adapter Privacy on Mobile",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1850",
"title": "On-Device Personalization with Differential Privacy on Mobile",
"bloom": "evaluate"
}
],
"rationale": "Addresses mobile privacy for on-device adaptation, moving from assessing model inversion risks of per-user LoRA adapters to designing a robust differential privacy mechanism.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-016-19",
"track": "mobile",
"topic": "software-portability",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-1807",
"title": "CoreML vs ONNX Runtime on Apple A17 Pro Neural Engine",
"bloom": "understand"
},
{
"level": "L4",
"id": "mobile-1808",
"title": "Cross-Platform Mobile Inference: TFLite GPU Delegate on Android vs CoreML on iOS",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-1811",
"title": "Portable On-Device LLM Inference Across Mobile Platforms",
"bloom": "create"
}
],
"rationale": "Teaches portable ML deployment on mobile, moving from tracing framework overheads (ONNX vs CoreML), to building cross-platform real-time pipelines, to serving large models across distinct architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-017-08",
"track": "mobile",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "mobile-0453",
"title": "The Mystery of the Slow Avatar",
"bloom": "apply"
},
{
"level": "L5",
"id": "mobile-0632",
"title": "The Multi-Node Latency Catastrophe",
"bloom": "evaluate"
}
],
"rationale": "Investigates backend latency bottlenecks for mobile apps, moving from diagnosing intra-node tensor parallelism misconfigurations to analyzing multi-node scaling failures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-017-09",
"track": "mobile",
"topic": "interconnect-topology",
"competency_area": "networking",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "mobile-1819",
"title": "Chiplet Interconnect Bandwidth for Multi-Model Mobile",
"bloom": "apply"
},
{
"level": "L4",
"id": "mobile-1818",
"title": "SoC Die-to-Die Interconnect for Heterogeneous Inference",
"bloom": "apply"
}
],
"rationale": "Explores mobile SoC interconnect constraints, progressing from calculating concurrent bandwidth demands on the NoC to actively designing model partitioning to avoid NoC bottlenecks.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-017-32",
"track": "mobile",
"topic": "speculative-decoding",
"competency_area": "optimization",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-0602",
"title": "The Speculative Speedup",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-0628",
"title": "The Speculative Decoding Memory Trap",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "mobile-0645",
"title": "The On-Device LLM Keyboard Power Drain",
"bloom": "create"
}
],
"rationale": "Guides the learner from the theoretical speedup of speculative decoding to its catastrophic memory limits on mobile SoC, culminating in designing a low-power, latency-strict keyboard system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-017-33",
"track": "mobile",
"topic": "speculative-decoding",
"competency_area": "latency",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-0787",
"title": "Speculative Decoding Performance Regression",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-1070",
"title": "Speculative Decoding Feasibility on Mobile NPU",
"bloom": "analyze"
}
],
"rationale": "Investigates effective throughput regressions in speculative decoding, moving from diagnosing CPU-draft overheads to analyzing the hard synchronization penalties of context-swapping on a single mobile NPU.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-017-47",
"track": "mobile",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "mobile-2121",
"title": "Hexagon-Adreno Pipeline",
"bloom": "analyze"
},
{
"level": "L5",
"id": "mobile-2126",
"title": "Video SR Memory Barrier",
"bloom": "evaluate"
}
],
"rationale": "Explores pipelining within a single mobile SoC, moving from calculating pipeline bubbles across heterogeneous cores to analyzing multi-frame synchronization barriers in video super-resolution.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "mobile-chain-auto-secondary-017-48",
"track": "mobile",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L4",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "mobile-2123",
"title": "Wi-Fi Direct MAC Overhead",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "mobile-2127",
"title": "Multi-Device AR Pipeline",
"bloom": "create"
}
],
"rationale": "Investigates distributed inference across wireless mobile devices, progressing from calculating static MAC contention overhead to designing complex schedules that align with strict asynchronous radio intervals.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-003-09",
"track": "tinyml",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L2",
"L3",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1341",
"title": "Latency Decomposition for Keyword Spotting on ARM Cortex-M4",
"bloom": "analyze"
},
{
"level": "L3",
"id": "tinyml-1168",
"title": "Calculate End-to-End Wake Word Pipeline Latency",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1345",
"title": "TinyML Keyword Spotting Latency Decomposition on ARM Cortex-M4",
"bloom": "analyze"
}
],
"rationale": "Deconstructs keyword spotting latency on the Cortex-M4, starting from a high-level decomposition approach, calculating theoretical latency, and specifying a full sub-150ms pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-003-12",
"track": "tinyml",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L1",
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1281",
"title": "Identifying Latency Components on ESP32-S3",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-0865",
"title": "Latency Decomposition: Size End-to-End Latency for Environmental Sensor TinyML Node",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1245",
"title": "ESP32-S3 Speech-to-Intent Latency Breakdown",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1348",
"title": "Latency Decomposition and Optimization for Real-time TinyML on ESP32-S3",
"bloom": "analyze"
}
],
"rationale": "Guides the learner through ESP32-S3 latency analysis, from identifying the preprocessing component to sizing complete environmental and speech-to-intent pipelines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-003-13",
"track": "tinyml",
"topic": "latency-decomposition",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0190",
"title": "The Millisecond Machine Stop",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0088",
"title": "The Sensor Pipeline Without Drops",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-0123",
"title": "The Sub-Millisecond Fault Detector",
"bloom": "create"
}
],
"rationale": "Focuses on extreme low-latency (sub-millisecond) industrial anomaly detection pipelines, establishing the budget, managing sensor windows, and designing the final Cortex-M4 solution.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-003-14",
"track": "tinyml",
"topic": "model-size-estimation",
"competency_area": "memory",
"levels": [
"L2",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0883",
"title": "Model Size Estimation: Fluency \u2014 Size MCU Model Memory in Under 30 Seconds",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1298",
"title": "KWS Memory Architecture on Cortex-M4",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1387",
"title": "TinyML Model Deployment on ARM Cortex-M4: Memory Constraints",
"bloom": "analyze"
}
],
"rationale": "A progression for Cortex-M4 sizing, starting with a rapid parameter-to-flash fluency check, moving to SRAM partitioning, and culminating in a full weight/activation estimation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-003-16",
"track": "tinyml",
"topic": "model-size-estimation",
"competency_area": "architecture",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1046",
"title": "ESP32-S3 INT8 Keyword Spotting Memory Footprint",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0993",
"title": "Architecting a KWS Memory Pipeline for ESP32-S3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1195",
"title": "ESP32-S3 Memory Hierarchy Design for Always-On Audio",
"bloom": "create"
}
],
"rationale": "Explores the ESP32-S3 memory hierarchy, analyzing whether an INT8 CNN fits in fast SRAM, designing the pipeline, and keeping critical paths out of slower PSRAM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-01",
"track": "tinyml",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1322",
"title": "TinyML Compute Estimation for Nordic nRF5340",
"bloom": "analyze"
},
{
"level": "L3",
"id": "tinyml-1153",
"title": "Estimating Inference Latency and Energy on nRF5340",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1323",
"title": "Diagnosing High Inference Cost on nRF5340 for Keyword Spotting",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1324",
"title": "TinyML Resource Estimation for Keyword Spotting on nRF5340",
"bloom": "analyze"
}
],
"rationale": "Progresses from identifying required compute metrics to calculating latency/energy, diagnosing high inference costs, and finally estimating complete resource needs for keyword spotting on the nRF5340.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-02",
"track": "tinyml",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0954",
"title": "NPU Utilization and Cycle Cost Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1213",
"title": "Ethos-U55 Compute Utilization and Bottleneck Analysis",
"bloom": "analyze"
}
],
"rationale": "Advances from basic NPU cycle cost analysis to in-depth bottleneck quantification for audio processing CNNs on the Corstone-300 subsystem.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-03",
"track": "tinyml",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1162",
"title": "Estimating Inference Latency for INT8 Convolution on Cortex-M4",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1670",
"title": "Depthwise Convolutions on MCU",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1678",
"title": "Depthwise Separable Convolution Execution Cycles on Microcontrollers",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1602",
"title": "MCU Convolution Cycle Cost",
"bloom": "create"
}
],
"rationale": "Builds understanding of INT8 convolution latency, then depthwise MAC calculations, leading into execution cycle evaluation, and culminating in an analytical cycle-accurate cost model for custom kernels.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-04",
"track": "tinyml",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1725",
"title": "Cortex-M4 Inference Throughput",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1613",
"title": "Calculate energy efficiency between slow and fast microcontroller clock modes",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0984",
"title": "Architecting Energy-Constrained Audio Inference",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0818",
"title": "Acoustic Monitor Power Budgeting",
"bloom": "create"
}
],
"rationale": "Teaches throughput estimation, energy efficiency across clock modes, architecture for battery life constraints, and strict CPU power budgeting for acoustic models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-05",
"track": "tinyml",
"topic": "compute-cost-estimation",
"competency_area": "compute",
"levels": [
"L1",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1265",
"title": "ESP32-S3 Memory Hierarchy for Model Deployment",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-1103",
"title": "ESP32-S3 PSRAM Bandwidth Compute Bottleneck",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1325",
"title": "Edge AI Power Budget: Optimizing Gesture Recognition on ESP32-S3",
"bloom": "analyze"
}
],
"rationale": "Starts with basic ESP32 memory hierarchy recall, diagnoses real-world PSRAM bandwidth bottlenecks, and culminates in estimating cloud/edge energy costs for gesture recognition.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-10",
"track": "tinyml",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L1",
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0021",
"title": "The OTA Flash Memory Tax",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-0414",
"title": "Bootloader A/B Partition Sizing",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0062",
"title": "Bootloader A/B Firmware Partitioning",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1209",
"title": "Asymmetric OTA Architecture for Dual-Core ML",
"bloom": "create"
}
],
"rationale": "Explores the constraints of OTA flash memory taxes, sizing A/B partitions with delta updates, designing layout architectures, and finally architecting complex asymmetric OTA schemes for dual-core systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-11",
"track": "tinyml",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0056",
"title": "BLE Throughput for Model Update",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0585",
"title": "Updating a 500 KB Model Over BLE 5.0",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0063",
"title": "Fleet-Wide Model Update Strategy",
"bloom": "evaluate"
}
],
"rationale": "Starts with calculating BLE throughput costs, analyzes bandwidth tradeoffs based on quantization format, and scales up to architecting a fleet-wide multi-connectivity update strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-12",
"track": "tinyml",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0971",
"title": "OTA Rollback due to Shared SRAM Exhaustion",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1224",
"title": "Delta OTA Optimization for Ethos-U55 NPU Models",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0068",
"title": "The OTA Update Brickening",
"bloom": "create"
}
],
"rationale": "Investigates how SRAM contention causes OTA failures, optimizes updates via delta compression to fit memory, and diagnoses large-scale fleet bricking caused by memory-related tensor arena changes.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-13",
"track": "tinyml",
"topic": "ota-firmware-updates",
"competency_area": "deployment",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1424",
"title": "ESP32-S3 A/B OTA Flash Partition Sizing for ML Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1251",
"title": "ESP32-S3 Flash Partitioning for Model OTA",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the ESP32-S3 platform, moving from calculating basic A/B partition sizes for ML images to proposing a comprehensive deployment strategy that ensures safe updates within the 8 MB flash limit.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-14",
"track": "tinyml",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L1",
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1284",
"title": "Asymmetric Quantization Equation for nRF5340",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-1726",
"title": "Static Quantization Parameters",
"bloom": "understand"
},
{
"level": "L4",
"id": "tinyml-1013",
"title": "Diagnosing Asymmetric Quantization Overhead",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1255",
"title": "Evaluating Quantization Granularity on nRF5340",
"bloom": "evaluate"
}
],
"rationale": "Progresses from recalling the asymmetric quantization equation to computing static parameters, diagnosing the massive overhead of asymmetric operations on the nRF5340, and finally evaluating granularity tradeoffs under a strict latency budget.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-15",
"track": "tinyml",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1722",
"title": "Missing Calibration in Post-Training Quantization",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-0453",
"title": "The Cafeteria False Wake",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1361",
"title": "Quantization Drift on Nordic nRF5340 for Keyword Spotting",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1694",
"title": "Calibration Bias in PTQ",
"bloom": "evaluate"
}
],
"rationale": "Traces the impact of calibration from identifying missing steps, to observing environmental failures, diagnosing on-device quantization drift, and evaluating the underlying calibration bias in PTQ.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-16",
"track": "tinyml",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "tinyml-1094",
"title": "ESP32-S3 Audio Wake-Word Quantization Pipeline",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1201",
"title": "Mixed-Precision Audio Keyword Spotting on ESP32-S3",
"bloom": "create"
}
],
"rationale": "Advances from designing an optimal memory-constrained quantization pipeline for the ESP32-S3 to architecting a full mixed-precision solution with hardware vector extension utilization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-17",
"track": "tinyml",
"topic": "quantization-fundamentals",
"competency_area": "precision",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "tinyml-1117",
"title": "INT8 Per-Tensor Degradation in Depthwise Convolutions",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0738",
"title": "The Per-Channel Trade-off",
"bloom": "evaluate"
}
],
"rationale": "Analyzes why per-tensor INT8 quantization catastrophically degrades depthwise convolution accuracy, and evaluates the system-level tradeoffs of switching to per-channel quantization to recover it.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-19",
"track": "tinyml",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0935",
"title": "Recall Arithmetic Intensity Threshold for Attention on MCU",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0937",
"title": "Implement Tiled Matrix Multiply for Attention on Cortex-M7",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0933",
"title": "Evaluate Pruning Strategies for Transformer Attention on ESP32-S3",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0938",
"title": "Achieve Mastery in Transformer Inference Optimization on MCU",
"bloom": "create"
}
],
"rationale": "Starts by recalling arithmetic intensity thresholds for attention, applies this to implement tiled matrix multiplications, evaluates pruning strategies to meet latency targets, and culminates in a master roadmap for end-to-end transformer optimization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-20",
"track": "tinyml",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "tinyml-0924",
"title": "Analyze Chinchilla Scaling for MCU-Deployable Transformers",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0942",
"title": "Optimize Weight Sharing Across Transformer Layers on Flash-Limited MCU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0939",
"title": "Master Flash-Aware Transformer Scheduling on ESP32-S3",
"bloom": "create"
}
],
"rationale": "Explores the architectural compromises of deploying scaled transformers to MCUs, optimizing flash usage via cross-layer weight sharing, and mastering dynamic SPI flash-aware scheduling for large models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-21",
"track": "tinyml",
"topic": "transformer-systems-cost",
"competency_area": "compute",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0936",
"title": "Recall Memory-Bandwidth-Bound Decode on Embedded Hardware",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0947",
"title": "Realize Streaming Transformer Inference with Circular KV Buffer",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0925",
"title": "Design Speculative Decoding for Cortex-M4 Transformer",
"bloom": "create"
}
],
"rationale": "Begins with identifying memory-bandwidth limits in decode, implements streaming inference with circular buffers, and designs a speculative decoding system to maximize tokens per second.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-31",
"track": "tinyml",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1059",
"title": "AOT Compiler Memory-Latency Tradeoff Analysis",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1085",
"title": "AOT Compiler Memory Architecture for STM32F4",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1413",
"title": "TinyML Graph Optimization for FPU-less ARM Cortex-M4",
"bloom": "analyze"
}
],
"rationale": "Evaluates fundamental memory-latency tradeoffs introduced by AOT compilers, moves to architecting memory passes for constrained STM32F4 SRAM, and masters aggressive graph optimizations for FPU-less Cortex-M4 systems.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-32",
"track": "tinyml",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0962",
"title": "Operator Fusion Memory Tradeoff on nRF5340",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0752",
"title": "The Operator Fusion on MCU",
"bloom": "evaluate"
}
],
"rationale": "Connects the observation of operator fusion causing unexpected OOM errors by increasing peak SRAM usage to the strategic evaluation of SRAM and latency savings when fusing Conv2D, BatchNorm, and ReLU on Cortex-M7.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-004-33",
"track": "tinyml",
"topic": "graph-compilation",
"competency_area": "optimization",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "tinyml-1294",
"title": "Ethos-U55 Operator Offloading and Fallback Analysis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1410",
"title": "Optimizing a TinyML Model for Cortex-M7 with Ethos-U55: Graph Compilation Strategy",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1208",
"title": "Ethos-U55 Compiler Tiling for Operator Fallback",
"bloom": "create"
}
],
"rationale": "Starts with analyzing system-level implications of unsupported activation fallbacks, designs a partitioning strategy for the M7+U55, and implements advanced compiler tiling to avoid spilling large activations to external memory during fallback.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-005-15",
"track": "tinyml",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1283",
"title": "ESP32-S3 Structured Pruning Impact",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-1303",
"title": "Designing a Pruning Strategy for ESP32-S3 SRAM Constraints",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1400",
"title": "Optimizing Keyword Spotting CNN for ESP32-S3 with Pruning & Sparsity",
"bloom": "analyze"
}
],
"rationale": "Progresses from the basic knowledge of vector instruction requirements, to designing a pruning strategy under strict SRAM limits, and combining pruning and sparsity for optimal performance.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-005-16",
"track": "tinyml",
"topic": "pruning-sparsity",
"competency_area": "optimization",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0175",
"title": "The Flash Budget Crunch",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-0755",
"title": "The Model Compression for Flash",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1399",
"title": "Pruning Strategies for Resource-Constrained Microcontrollers (ARM Cortex-M4)",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1254",
"title": "Structured vs Unstructured Pruning on Cortex-M4",
"bloom": "evaluate"
}
],
"rationale": "Begins with calculating unstructured sparsity to meet a flash budget, evaluates broad compression options, explains why structured beats unstructured on an M4 without an FPU, and formally evaluates these approaches against each other.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-008-06",
"track": "tinyml",
"topic": "safety-certification",
"competency_area": "reliability",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0084",
"title": "Watchdog Timers and Hard Real-Time Guarantees",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1229",
"title": "Optimizing Inference for Watchdog Deadlines",
"bloom": "analyze"
}
],
"rationale": "Focuses on the strict requirement of Worst-Case Execution Time (WCET) for watchdogs and optimizing inference to fit safely within those deadlines.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-008-08",
"track": "tinyml",
"topic": "safety-certification",
"competency_area": "reliability",
"levels": [
"L3",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1074",
"title": "Shared SRAM Contention WDT Resets",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1258",
"title": "Architecting ML Safety on Dual-Core nRF5340",
"bloom": "evaluate"
}
],
"rationale": "Analyzes the impact of shared SRAM contention between dual cores on watchdog resets, progressing from failure analysis to robust architectural design on the nRF5340.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-008-09",
"track": "tinyml",
"topic": "tco-cost-modeling",
"competency_area": "cross-cutting",
"levels": [
"L1",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-0284",
"title": "The Energy Tax of the Cloud",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-0357",
"title": "The TCO of Transmission",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0904",
"title": "TinyML TCO Evaluation: Cloud-in-the-Loop vs Fully On-Device TinyML",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0912",
"title": "TinyML TCO Mastery: Make On-Device vs Cloud Decision for Industrial TinyML",
"bloom": "create"
}
],
"rationale": "Explores the shifting economics from cloud transmission dependencies to fully on-device TinyML inference across various deployment scales.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-008-10",
"track": "tinyml",
"topic": "tco-cost-modeling",
"competency_area": "deployment",
"levels": [
"L2",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-0893",
"title": "TinyML TCO Recall: Cortex-M4 vs ESP32-S3 Cost Profile",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-0902",
"title": "TinyML TCO Evaluation: Cortex-M4 vs ESP32-S3 for Production Deployment",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "tinyml-0898",
"title": "TinyML TCO Design: Fleet TCO for Agricultural Sensor Network",
"bloom": "create"
}
],
"rationale": "Systematically compares the TCO of low-power MCU hardware (Cortex-M4) versus integrated-wireless platforms (ESP32-S3) in fleet deployments.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-009-06",
"track": "tinyml",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1280",
"title": "Ethos-U55 Supported Quantization Precisions",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1368",
"title": "Sub-4-bit Quantization on Cortex-M7 + Ethos-U55 for TinyML",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1107",
"title": "Ethos-U55 W4A8 Fallback Stalls",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1241",
"title": "Sub-4-bit Quantization Tradeoffs for Keyword Spotting on Corstone-300",
"bloom": "evaluate"
}
],
"rationale": "Explores hardware limitations of the Ethos-U55 for sub-4-bit quantization, calculating SRAM limits, diagnosing fallback stalls to the M7 CPU, and evaluating the memory/latency tradeoffs of extreme quantization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-009-07",
"track": "tinyml",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0959",
"title": "4-Bit Quantization Latency Regression on nRF5340",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1292",
"title": "Sub-4-bit Quantization Specification for nRF5340 Audio Model",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1369",
"title": "Extreme Quantization on nRF5340: Architecture Evaluation",
"bloom": "analyze"
}
],
"rationale": "Moves from observing latency regressions of 4-bit weights on the nRF5340, to specifying the system for a sub-4-bit audio model, to making high-level architectural tradeoffs between 8-bit and 2-bit models.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-009-08",
"track": "tinyml",
"topic": "extreme-quantization",
"competency_area": "precision",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1155",
"title": "2-Bit Weight Unpacking and Execution on Cortex-M4",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1083",
"title": "Architecting Sub-4-bit Keyword Spotting",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1188",
"title": "Ternary Weight Transformer on Cortex-M4",
"bloom": "create"
}
],
"rationale": "Starts with calculating the footprint of 2-bit weights on a Cortex-M4, progresses to architecting an inference system without native sub-byte instructions, and culminates in designing a ternary-weight transformer.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-009-09",
"track": "tinyml",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1274",
"title": "CMOS Dynamic Power Equation Recall",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1500",
"title": "MCU Sleep Mode Strategy for Always-On Wake Word Detection",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1012",
"title": "Diagnosing Battery Drain in ESP32-S3 Wake-Word Engine",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1252",
"title": "Evaluating Race-to-Sleep vs DVFS on ESP32-S3",
"bloom": "evaluate"
}
],
"rationale": "Takes the learner from recalling dynamic power equations to estimating cascaded sleep mode power, diagnosing an actual battery drain fault, and finally evaluating race-to-sleep against DVFS strategies on the ESP32-S3.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-009-10",
"track": "tinyml",
"topic": "power-budgeting",
"competency_area": "power",
"levels": [
"L2",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1372",
"title": "Nordic nRF5340 Power Mode Analysis",
"bloom": "analyze"
},
{
"level": "L3",
"id": "tinyml-1145",
"title": "Estimating Energy Per Inference on Nordic nRF5340",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1376",
"title": "Optimizing ML Inference Power on Nordic nRF5340",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0995",
"title": "Dual-Core Power Partitioning on nRF5340",
"bloom": "evaluate"
}
],
"rationale": "Builds competency on the nRF5340 by starting with power mode analysis, calculating specific energy-per-inference, optimizing DVFS to meet a micro-watt budget, and architecting a dual-core power partition.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-010-15",
"track": "tinyml",
"topic": "data-pipeline-engineering",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0433",
"title": "The Sensor Fusion Skew",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-0007",
"title": "The Ghost Drift",
"bloom": "evaluate"
},
{
"level": "L5",
"id": "tinyml-0610",
"title": "Fusing Accelerometer + Microphone + Temperature on One MCU",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0125",
"title": "Always-On Multi-Modal Sensor Fusion System",
"bloom": "create"
}
],
"rationale": "Covers the complexities of multi-modal sensor fusion, advancing from diagnosing hardware-induced skew and clock drift to designing SRAM layouts and complete always-on architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-010-17",
"track": "tinyml",
"topic": "dma-data-movement",
"competency_area": "memory",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1285",
"title": "Ethos-U55 Shared SRAM Architecture",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-1004",
"title": "NPU Inference Latency and CPU Bottleneck",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1239",
"title": "Zero-Copy DMA Pipeline for Ethos-U55",
"bloom": "evaluate"
}
],
"rationale": "Teaches memory and DMA management for vision pipelines on Ethos-U55, starting with shared SRAM architecture, diagnosing CPU bottlenecks, and architecting zero-copy DMA pipelines for 60 FPS inference.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-010-19",
"track": "tinyml",
"topic": "operator-scheduling",
"competency_area": "optimization",
"levels": [
"L2",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1414",
"title": "Ethos-U55 Scheduling for Memory and Parallelism",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1417",
"title": "Diagnosing Inefficient Operator Scheduling on Cortex-M7/Ethos-U55 for TinyML",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1028",
"title": "Ethos-U55 Depth-First Scheduling for Memory Reuse",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1420",
"title": "Ethos-U55 Operator Scheduling for Memory and Performance",
"bloom": "analyze"
}
],
"rationale": "Explores operator scheduling on Ethos-U55 NPUs, from basic memory and parallelism concepts to diagnosing low utilization, selecting depth-first strategies, and maximizing CPU/NPU parallel execution.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-010-20",
"track": "tinyml",
"topic": "operator-scheduling",
"competency_area": "optimization",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1273",
"title": "Nordic nRF5340 Dual-Core Architecture for Operator Scheduling",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-1114",
"title": "SRAM Exhaustion in Multi-Branch CNN Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1250",
"title": "Dual-Core Operator Scheduling on Nordic nRF5340",
"bloom": "evaluate"
}
],
"rationale": "Focuses on dual-core operator scheduling for the nRF5340, progressing from architectural basics to diagnosing SRAM exhaustion in branched networks and designing optimal multi-core schedules.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-011-16",
"track": "tinyml",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L1",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1264",
"title": "Identifying Side-Channel Attacks on Edge NPUs",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-1001",
"title": "NPU Shared SRAM Bus Contention Side-Channel",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1451",
"title": "Adversarial Robustness Evaluation on Cortex-M7 + Ethos-U55 for TinyML",
"bloom": "analyze"
}
],
"rationale": "Explores side-channel vulnerabilities on the Ethos-U55, from recognizing the physical threat to diagnosing SRAM contention extraction, and finally evaluating holistic adversarial robustness.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-011-17",
"track": "tinyml",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0952",
"title": "Adversarial Denial of Sleep Attack Analysis",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1452",
"title": "Optimizing Adversarial Defense on Resource-Constrained TinyML",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0983",
"title": "Secure TinyML Keyword Spotting Architecture",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1182",
"title": "Adversarial Defense for Wake-Word on ARM Cortex-M4",
"bloom": "create"
}
],
"rationale": "Focuses on defending always-on Cortex-M4 models against adversarial noise, progressing from analyzing denial-of-sleep power drain to optimizing defense latencies, and designing the full secure architecture.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-011-18",
"track": "tinyml",
"topic": "adversarial-robustness",
"competency_area": "reliability",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1160",
"title": "Randomized Smoothing Latency on ESP32-S3",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1017",
"title": "Evaluating Adversarial Defenses on ESP32-S3 Smart Locks",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1453",
"title": "Adversarial Robustness in ESP32-S3 Anomaly Detection",
"bloom": "analyze"
}
],
"rationale": "Moves from computing the strict latency overhead of randomized smoothing on ESP32-S3 to evaluating defense alternatives for smart locks, and culminating in a comprehensive robust anomaly detection system design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-011-19",
"track": "tinyml",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L1",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1272",
"title": "Identifying Peak NPU Throughput for Corstone-300 Architecture Search",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1394",
"title": "Hardware-Aware NAS on Ethos-U55: Memory and Latency Constraints",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1113",
"title": "Diagnosing NAS SRAM Constraints on Corstone-300",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1249",
"title": "SRAM-Constrained NAS on Ethos-U55",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1397",
"title": "Hardware-aware NAS for Edge Deployment on Cortex-M7/Ethos-U55",
"bloom": "analyze"
}
],
"rationale": "Takes the learner from fundamental throughput calculations on Ethos-U55 to evaluating candidate constraints, diagnosing hidden memory overheads during NAS, and formulating the final hardware-aware NAS pipeline.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-011-20",
"track": "tinyml",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0969",
"title": "NAS Memory Constraint Analysis on nRF5340",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1299",
"title": "Hardware-Aware NAS for Keyword Spotting on nRF5340",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1395",
"title": "Hardware-Aware NAS on Nordic nRF5340: Memory and Latency Evaluation",
"bloom": "analyze"
}
],
"rationale": "Explores the memory and concurrency constraints of the nRF5340, progressing from diagnosing a specific rejected block to setting search space limits, and comparing resulting candidate architectures.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-011-21",
"track": "tinyml",
"topic": "neural-architecture-search",
"competency_area": "architecture",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "tinyml-1390",
"title": "Hardware-Aware NAS on ESP32-S3: Performance & Memory Trade-offs",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1134",
"title": "Evaluating Hardware-Aware NAS for ESP32-S3",
"bloom": "evaluate"
}
],
"rationale": "Progresses from analyzing why an ESP32-S3 NAS converged on a specific latency/memory tradeoff to evaluating the system-level benefits of different candidate architectures on battery life.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-15",
"track": "tinyml",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1151",
"title": "ESP32-S3 Vector Extension Speedup",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0982",
"title": "ESP32-S3 vs External NPU for Edge AI",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1320",
"title": "Accelerator Selection for Edge ML on ESP32-S3",
"bloom": "analyze"
}
],
"rationale": "Evaluates ESP32-S3 acceleration, starting with the theoretical latency of vector extensions versus scalar, comparing native execution to an external NPU, and selecting the optimal architecture under power limits.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-16",
"track": "tinyml",
"topic": "accelerator-comparison",
"competency_area": "compute",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-0951",
"title": "Cortex-M4 vs NPU Latency Analysis",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1211",
"title": "SIMD vs NPU for STM32F4 Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1317",
"title": "TinyML Anomaly Detection: Cortex-M4 CPU vs. Custom ASIC for 8-bit CNN",
"bloom": "analyze"
}
],
"rationale": "Explores the Cortex-M4 versus NPU decision, starting with latency math, evaluating SIMD capabilities against deadlines, and concluding with industrial anomaly detection offloading strategies.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-17",
"track": "tinyml",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L1",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1267",
"title": "SRAM vs Compute Energy Cost",
"bloom": "remember"
},
{
"level": "L4",
"id": "tinyml-1106",
"title": "Diagnosing Power Drain from Memory Access",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1240",
"title": "Arithmetic Intensity and SRAM Energy Tradeoffs",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1379",
"title": "Optimizing CNN Energy on ARM Cortex-M4 STM32F4",
"bloom": "analyze"
}
],
"rationale": "Applies Horowitz energy principles to Cortex-M4, starting from basic SRAM vs compute costs, diagnosing power drains from fully connected layers, guiding layer choices by arithmetic intensity, and cutting overall CNN energy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-18",
"track": "tinyml",
"topic": "energy-per-operation",
"competency_area": "power",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1039",
"title": "Energy Cost of Memory vs Compute on Ethos-U55",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-0986",
"title": "Energy-Aware Model Architecture for Corstone-300",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1187",
"title": "Energy-Optimal Architecture for Corstone-300",
"bloom": "create"
}
],
"rationale": "Explores Corstone-300 energy optimization, calculating NPU memory vs compute costs, architecting vision models to minimize SRAM access, and trading computational complexity for memory locality.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-19",
"track": "tinyml",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1040",
"title": "ESP32-S3 Low-Battery Model Fallback",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1084",
"title": "Architecting Graceful Degradation for ESP32-S3 Voice Commands",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1190",
"title": "Asymmetric Dual-Model Degradation on ESP32-S3",
"bloom": "create"
}
],
"rationale": "Designs graceful degradation for ESP32-S3 audio models, starting with low-battery fallback latencies, staging SRAM/PSRAM models for WiFi loss, and culminating in asymmetric dual-model architectures for complex faults.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-20",
"track": "tinyml",
"topic": "graceful-degradation",
"competency_area": "reliability",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1166",
"title": "Calculate Fallback Model MAC Budget on Cortex-M7",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1437",
"title": "TinyML Graceful Degradation for Predictive Maintenance on Cortex-M7/Ethos-U55",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1129",
"title": "Degradation Strategy for Ethos-U55 Anomaly Detection",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1440",
"title": "Graceful Degradation for Real-time TinyML Anomaly Detection",
"bloom": "analyze"
}
],
"rationale": "Covers thermal throttling and QoS shedding on Corstone-300, from calculating CPU-only fallback budgets, diagnosing performance drops, choosing degradation strategies during NPU shutdown, and implementing full QoS shedding.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-21",
"track": "tinyml",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1146",
"title": "Corstone-300 NPU Latency Profiling Math",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1352",
"title": "Optimizing ML Inference Latency on Cortex-M7/Ethos-U55 for Real-time Edge Applications",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0996",
"title": "Profiling CPU-NPU Memory Contention on Corstone-300",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1199",
"title": "NPU-CPU Bus Contention and Trace Profiling in Corstone-300",
"bloom": "create"
}
],
"rationale": "Explores Corstone-300 latency profiling, starting with theoretical latency math, identifying primary bottlenecks, employing non-intrusive trace strategies to isolate memory contention, and redesigning architectures based on pipeline bubbles.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-012-22",
"track": "tinyml",
"topic": "profiling-bottleneck-analysis",
"competency_area": "latency",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1275",
"title": "Cycle Count Profiling on ARM Cortex-M4",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1353",
"title": "Optimizing TinyML Inference Latency on STM32F4",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1116",
"title": "Diagnosing CMSIS-NN SIMD Underutilization",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1253",
"title": "Optimizing SIMD Utilization and Memory Stalls on Cortex-M4",
"bloom": "evaluate"
}
],
"rationale": "Builds profiling skills for Cortex-M4, starting from basic cycle count registers, choosing specific profiling tools for FPU-less systems, diagnosing SIMD underutilization, and making architectural changes to fix memory stalls.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-01",
"track": "tinyml",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L2",
"L3",
"L4",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1476",
"title": "Essential Responsible AI Documentation for ESP32-S3 TinyML Deployment",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1072",
"title": "Fairness Guardrail PSRAM Latency Bottleneck on ESP32",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1227",
"title": "Optimizing OOD Safety Guardrails on ESP32-S3",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1481",
"title": "Responsible AI for Edge Safety: ESP32-S3 Anomaly Detection",
"bloom": "analyze"
}
],
"rationale": "Follows the lifecycle of an ESP32-S3 TinyML model from essential documentation, to diagnosing PSRAM latency of a guardrail, optimizing the OOD guardrail, and finally synthesizing a complete safety-critical governance strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-02",
"track": "tinyml",
"topic": "responsible-ai",
"competency_area": "cross-cutting",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1147",
"title": "Guardrail Latency Budget on Ethos-U55",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1477",
"title": "Responsible AI on Constrained Edge Devices: Model Card & Guardrail Implementation",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1096",
"title": "On-Device Guardrails for Predictive Maintenance",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1203",
"title": "On-Device PII Redaction Guardrail",
"bloom": "create"
}
],
"rationale": "Traces guardrail deployment on the Cortex-M7/Ethos-U55 platform, starting with latency math, progressing to implementation and architectural evaluation for predictive maintenance, and culminating in a complex PII redaction design.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-24",
"track": "tinyml",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1073",
"title": "ESP32-S3 SRAM vs PSRAM Roofline Shift",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1228",
"title": "ESP32-S3 Roofline Analysis for Wake-Word",
"bloom": "analyze"
}
],
"rationale": "Progresses from analyzing how moving weights from SRAM to PSRAM shifts the roofline model, to diagnosing the exact wake-word compute bottleneck and quantifying the speedup of pinning weights back to SRAM.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-25",
"track": "tinyml",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1148",
"title": "Roofline Ridge Point Calculation on Ethos-U55",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1314",
"title": "Roofline Analysis for TinyML: Cortex-M7 + Ethos-U55 Performance Evaluation",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1204",
"title": "Corstone-300 Roofline Optimization for Micro-Transformers",
"bloom": "create"
}
],
"rationale": "Follows the roofline analysis of the Corstone-300 platform, calculating the SRAM roofline ridge point, predicting if a CNN is memory/compute bound, and architecting a micro-transformer fusion strategy to improve utilization.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-26",
"track": "tinyml",
"topic": "roofline-analysis",
"competency_area": "compute",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1277",
"title": "Peak Compute Derivation for ARM Cortex-M4 Roofline",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1489",
"title": "Roofline Feasibility Check for MobileNetV2 on Cortex-M4",
"bloom": "evaluate"
},
{
"level": "L4",
"id": "tinyml-1315",
"title": "Roofline Analysis for TinyML on ARM Cortex-M4 STM32F4",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1257",
"title": "Roofline Analysis of Depthwise Convolutions",
"bloom": "evaluate"
}
],
"rationale": "Explores the compute limitations of the FPU-less Cortex-M4, starting from deriving peak INT8 MACs, checking feasibility of MobileNetV2, diagnosing a slow 1D CNN, and determining the theoretical maximum throughput of depthwise layers.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-27",
"track": "tinyml",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1149",
"title": "Audio Ingestion Memory and Cycle Budgeting",
"bloom": "apply"
},
{
"level": "L5",
"id": "tinyml-1469",
"title": "Real-Time Anomaly Detection on TinyML: Cortex-M7 + Ethos-U55 Architecture Evaluation",
"bloom": "analyze"
},
{
"level": "L6+",
"id": "tinyml-1206",
"title": "Real-Time Vibration Ingestion on Corstone-300",
"bloom": "create"
}
],
"rationale": "Progresses from budgeting memory/cycles for audio DMA on Corstone-300, to evaluating architecture trade-offs between feature extraction and raw NPU inference, and designing a zero-copy DMA system for 16 kHz vibration frames.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-28",
"track": "tinyml",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1075",
"title": "PSRAM Latency in Real-Time Audio Streaming",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1230",
"title": "DMA Ping-Pong Buffering for Continuous Sensor Ingestion",
"bloom": "analyze"
}
],
"rationale": "Focuses on streaming bottlenecks on the ESP32-S3, moving from understanding why PSRAM causes dropped audio frames, to diagnosing a complex SPI polling bottleneck during INT8 neural network inference.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-013-29",
"track": "tinyml",
"topic": "streaming-ingestion",
"competency_area": "data",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1177",
"title": "Vibration Ingestion Buffer Sizing",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1119",
"title": "Diagnosing Sensor Buffer Overrun During Inference",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1259",
"title": "Audio Streaming Ping-Pong Sizing on Cortex-M4",
"bloom": "evaluate"
}
],
"rationale": "Explores DMA ping-pong buffering on resource-constrained MCUs, starting with calculating sleep durations for buffer fills, diagnosing window corruption during a 35ms inference, and properly sizing an audio ping-pong buffer.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-014-07",
"track": "tinyml",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1167",
"title": "Distillation vs Pruning Latency on Ethos-U55",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1218",
"title": "Distillation vs Pruning for INT8 SIMD",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1130",
"title": "Evaluating Distillation vs Pruning for Ethos-U55 Deployment",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1408",
"title": "Optimizing Knowledge Distillation for Edge Deployment",
"bloom": "analyze"
}
],
"rationale": "Compares unstructured pruning against distillation for microcontrollers, advancing from latency calculations to full edge deployment strategy.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-014-08",
"track": "tinyml",
"topic": "knowledge-distillation",
"competency_area": "optimization",
"levels": [
"L1",
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1269",
"title": "Knowledge Distillation Soft Targets on nRF5340",
"bloom": "remember"
},
{
"level": "L3",
"id": "tinyml-1406",
"title": "TinyML Knowledge Distillation for nRF5340 Anomaly Detection",
"bloom": "analyze"
},
{
"level": "L4",
"id": "tinyml-1007",
"title": "Distilled Student SRAM Exhaustion",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1244",
"title": "Sizing a Distilled Keyword Spotting Model for nRF5340",
"bloom": "evaluate"
}
],
"rationale": "Focuses on the nRF5340 platform, starting from soft target definitions and advancing to diagnosing SRAM exhaustion and sizing models for tight constraints.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-016-05",
"track": "tinyml",
"topic": "model-serving-infrastructure",
"competency_area": "deployment",
"levels": [
"L2",
"L5",
"L6+"
],
"questions": [
{
"level": "L2",
"id": "tinyml-1608",
"title": "Identify TinyML memory regions for static weights and dynamic activations",
"bloom": "understand"
},
{
"level": "L5",
"id": "tinyml-1780",
"title": "Evaluating OTA Payload Limits for Firmware",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1673",
"title": "Differential LoRaWAN Updates",
"bloom": "create"
}
],
"rationale": "Teaches TinyML memory management specifically for OTA updates, from identifying regions, evaluating A/B firmware payload limits, to designing a differential update protocol for constrained links.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-016-20",
"track": "tinyml",
"topic": "collective-communication",
"competency_area": "networking",
"levels": [
"L3",
"L4",
"L5"
],
"questions": [
{
"level": "L3",
"id": "tinyml-1652",
"title": "BLE Mesh Federated Embedding Reduce",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1817",
"title": "Half-Duplex UART Ring AllReduce Diagnosis",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1808",
"title": "Logical Ring on Physical Daisy-Chain",
"bloom": "evaluate"
}
],
"rationale": "Explores federated aggregation over highly constrained links, starting with BLE Ring AllReduce analysis, diagnosing UART latency bounds, and evaluating logical rings mapped to physical daisy-chains.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-016-21",
"track": "tinyml",
"topic": "communication-computation-overlap",
"competency_area": "optimization",
"levels": [
"L1",
"L2",
"L3",
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L1",
"id": "tinyml-1628",
"title": "Peripheral DMA Overlap",
"bloom": "remember"
},
{
"level": "L2",
"id": "tinyml-1732",
"title": "DMA Audio Pipelining",
"bloom": "understand"
},
{
"level": "L3",
"id": "tinyml-1622",
"title": "Pipelining Compute and SPI Transfer Time",
"bloom": "apply"
},
{
"level": "L4",
"id": "tinyml-1719",
"title": "DMA Overlap to Meet Hard Real-Time Deadlines",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1716",
"title": "Throughput Impact of I/O Overlap on Microcontrollers",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1666",
"title": "SPI/DMA Pipeline Overlap",
"bloom": "create"
}
],
"rationale": "A perfect end-to-end progression on DMA optimization: recalling the peripheral, applying ping-pong buffering, calculating compute/SPI pipelines, evaluating hard real-time impacts, analyzing bus contention, and designing a zero-drop system.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-017-34",
"track": "tinyml",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L4",
"L5"
],
"questions": [
{
"level": "L4",
"id": "tinyml-1809",
"title": "RP2040 Core FIFO Spinlock",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1813",
"title": "Dual-Core RP2040 Pipeline Bubble Fraction",
"bloom": "evaluate"
}
],
"rationale": "Analyzes inter-core pipeline parallelism on microcontrollers, progressing from calculating raw spinlock synchronization latency to evaluating steady-state compute bubbles across asymmetric stages.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-017-35",
"track": "tinyml",
"topic": "pipeline-parallelism",
"competency_area": "parallelism",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "tinyml-1812",
"title": "SPI Daisy Chain Pipeline",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1816",
"title": "SPI Double-Buffering Pipeline Barrier Computation",
"bloom": "analyze"
}
],
"rationale": "Focuses on off-chip interconnect pipelines for TinyML, advancing from analyzing multi-hop SPI barriers to mathematically optimizing DMA double-buffering sizes to maximize throughput.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-017-50",
"track": "tinyml",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "tinyml-1560",
"title": "Raw ECG BLE Stream Viability",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-1643",
"title": "UART WiFi Bottleneck Analysis",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1541",
"title": "BLE Bandwidth Embedding Constraint",
"bloom": "create"
}
],
"rationale": "Addresses ultra-low-bandwidth wireless constraints, progressing from determining if raw streaming is viable to analyzing intermediate UART bottlenecks, and finally designing the exact embedding compression needed for a BLE sensor constellation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-017-51",
"track": "tinyml",
"topic": "network-bandwidth-bottlenecks",
"competency_area": "networking",
"levels": [
"L5",
"L6+"
],
"questions": [
{
"level": "L5",
"id": "tinyml-1562",
"title": "Hailo-8 PCIe Gen3 Saturated",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-1545",
"title": "Hailo-8 PCIe Frame Ingestion",
"bloom": "create"
}
],
"rationale": "Investigates host-to-accelerator bottlenecks for high-throughput edge vision, moving from calculating PCIe bandwidth saturation to actively optimizing the ingestion pipeline to prevent starvation.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
},
{
"chain_id": "tinyml-chain-auto-secondary-017-67",
"track": "tinyml",
"topic": "monitoring-observability",
"competency_area": "reliability",
"levels": [
"L4",
"L5",
"L6+"
],
"questions": [
{
"level": "L4",
"id": "tinyml-0058",
"title": "The Offline Drift Detector",
"bloom": "analyze"
},
{
"level": "L5",
"id": "tinyml-0628",
"title": "The Watchdog and the Unseen Workload",
"bloom": "evaluate"
},
{
"level": "L6+",
"id": "tinyml-0670",
"title": "The Ghost in the Dashboard",
"bloom": "create"
}
],
"rationale": "Explores the extreme challenges of monitoring ML without internet, progressing from basic offline drift detection to debugging watchdog reboots, and finally designing complex environmental fallback mechanisms.",
"_origin": "gemini-3.1-pro-preview",
"tier": "secondary"
}
]