diff --git a/labs/vol2/lab_17_ml_conclusion.py b/labs/vol2/lab_17_ml_conclusion.py index 04ddb79d8..e8501d693 100644 --- a/labs/vol2/lab_17_ml_conclusion.py +++ b/labs/vol2/lab_17_ml_conclusion.py @@ -7,38 +7,28 @@ app = marimo.App(width="full") # LAB V2-17: THE CONSTRAINTS NEVER LIE # # Volume II, Chapter 17 — Conclusion (Capstone) -# Core Invariant: Every invariant from both volumes reduces to one meta-principle: -# CONSTRAINTS DRIVE ARCHITECTURE. -# The interconnect wall forces parallelism strategies. -# The memory wall forces compression. -# Amdahl's Law bounds scaling. -# Little's Law bounds serving. -# Young-Daly bounds reliability. -# Chouldechova bounds fairness. -# None of these can be wished away — only navigated. +# +# Core Invariant: Synthesis of ALL Vol1 + Vol2 invariants. +# The physics does not change — the bottleneck moves with scale. +# No single architectural decision satisfies all constraints independently. +# The skilled ML architect does not escape the invariants; they navigate them. # # 2-Act Structure (35-40 minutes): -# Act I — The Complete Systems Map (12-15 min) -# Read ALL prior ledger entries (Vol1 ch1-16, Vol2 v2_01-v2_16). -# Build a "Systems Intuition Report" showing prediction accuracy by domain. -# Prediction: which category of invariants did you find most counterintuitive? -# Radar chart across 8 dimensions. +# Act I — Design Ledger Archaeology (12-15 min) +# Read ALL prior ledger entries. Surface constraint frequency. Radar of +# prediction accuracy. Commit to which invariant category you violated most. # -# Act II — Planet-Scale Architecture Challenge (20-25 min) -# Scenario: Chief ML Architect for a planetary-scale AI system. -# 5B users, 1T parameter model, $10B budget, 2027 carbon-neutral, -# 99.99% availability, GDPR + CCAA DP requirements, 193 jurisdictions fairness. -# Make 5 architectural decisions. 5 failure states. -# Failure states: OOM (cluster too small), P99 SLO violation, DP ε > GDPR limit, -# fairness gap > EU AI Act threshold, carbon over green grid capacity. +# Act II — The Final Architecture Challenge (20-25 min) +# Scenario: Chief Architect for real-time medical image classification. +# 1,000 hospitals · 100k inferences/day each · ≥95% accuracy · P99 < 200ms +# DP ε ≤ 1 (HIPAA) · >40% carbon reduction · 99.9% uptime +# Adversarial robustness ≥ 50% (PGD) · Budget: 10,000 H100s +# 6 simultaneous constraint scorecards. All must be green to deploy. # -# Deployment Context: Full Fleet (all 4 tiers active) +# Deployment Context: Full Fleet (cloud + medical grade) # # Design Ledger: saves chapter="v2_17" -# Keys: context, cluster_gpus, parallelism_strategy, checkpoint_interval_min, -# dp_epsilon, fairness_criterion, carbon_compliant, p99_slo_met, -# total_system_cost_b, act1_prediction, act1_correct, act2_result, -# act2_decision, constraint_hit, system_valid, invariants_connected +# Keys match the capstone schema in the assignment spec. # ───────────────────────────────────────────────────────────────────────────── @@ -61,57 +51,49 @@ def _(): ledger = DesignLedger() - # ── Hardware constants — all tiers (sources documented inline) ──────────── - H100_BW_GBS = 3350 # GB/s HBM3e; NVIDIA H100 SXM5 spec - H100_TFLOPS_FP16 = 1979 # TFLOPS tensor-core FP16; NVIDIA spec - H100_RAM_GB = 80 # GB HBM3e; NVIDIA spec - H100_TDP_W = 700 # Watts TDP; NVIDIA spec - H100_COST_USD = 40_000 # $ purchase price; industry market rate 2024 - H100_CLOUD_HR = 3.50 # $/GPU-hour cloud on-demand; AWS p4de rate - H100_MTBF_HOURS = 200 # hours per-GPU MTBF; @sec-fault-tolerance + # ── Cloud fleet hardware constants ──────────────────────────────────────── + H100_BW_GBS = 3350 # GB/s HBM3e; NVIDIA H100 SXM5 spec + H100_TFLOPS_FP16 = 1979 # TFLOPS FP16 tensor-core; NVIDIA spec + H100_RAM_GB = 80 # GB HBM3e; NVIDIA spec + H100_TDP_W = 700 # Watts TDP; NVIDIA spec - ORIN_BW_GBS = 102 # GB/s; Jetson Orin NX 16GB spec - ORIN_RAM_GB = 16 # GB LPDDR5; Jetson Orin NX spec - ORIN_TDP_W = 25 # Watts TDP; Jetson Orin NX spec + # ── Fleet scale constants ────────────────────────────────────────────── + FLEET_SIZE_NODES = 1000 # nodes in synthesis scenario; assignment spec + GPUS_PER_NODE = 8 # H100 SXM5 per node; NVIDIA DGX H100 config + CHECKPOINT_COST_S = 120 # seconds per checkpoint (1 TB model, NVMe); spec + MTBF_GPU_HOURS = 2000 # mean time between GPU failures (hours); spec - MOBILE_BW_GBS = 68 # GB/s; Apple A17 class NPU spec - MOBILE_RAM_GB = 8 # GB; typical flagship smartphone - MOBILE_TDP_W = 5 # Watts sustained; mobile thermal envelope + # ── NVLink / InfiniBand ──────────────────────────────────────────────── + NVLINK_BW_GBS = 900 # GB/s NVLink4 bidirectional per GPU; NVIDIA spec + IB_BW_GBPS = 400 # Gb/s InfiniBand NDR per port; Mellanox NDR spec - MCU_SRAM_KB = 256 # KB; ARM Cortex-M7 on-chip SRAM ceiling - MCU_BW_GBS = 0.05 # GB/s; SRAM bandwidth on Cortex-M7 + # ── Carbon constants ─────────────────────────────────────────────────── + COAL_CI_G_KWH = 820 # g CO2/kWh coal-heavy grid; IEA 2024 + RENEW_CI_G_KWH = 40 # g CO2/kWh renewable PPA; hyperscaler estimate + # Mixed global fleet baseline: @sec-sustainable-ai + BASELINE_CI_G_KWH = 386 # g CO2/kWh global fleet avg (mixed grid); spec - IB_HDR200_BW_GBS = 400 # GB/s InfiniBand HDR200; Mellanox spec - NVLINK4_BW_GBS = 900 # GB/s NVLink4 bidirectional; NVIDIA spec - - USERS_SCALE = 5_000_000_000 # 5 billion users; planetary-scale target - - # ── Training physics constants ───────────────────────────────────────────── - # 1T parameter model: FP16 weights + gradients + Adam optimizer = 20 bytes/param - # Source: @sec-training-memory-anatomy - BYTES_PER_PARAM_FULL = 20 # bytes; FP16 mixed-precision full training state - BYTES_PER_PARAM_BF16 = 2 # bytes; BF16 inference weights only - - # ── Carbon constants ─────────────────────────────────────────────────────── - # EU average grid carbon intensity 2024; IEA grid data - EU_GRID_CARBON_G_KWH = 255 # g CO2/kWh; EU average 2024 (IEA) - # Renewable PPA target (wind/solar) used by hyperscalers - RENEW_CARBON_G_KWH = 20 # g CO2/kWh; green PPA estimate - # Carbon-neutral threshold as proxy: ≤ 50 g CO2/kWh effective average - CARBON_THRESHOLD_G_KWH = 50 # g CO2/kWh; budget for "carbon-neutral by 2027" + # ── Medical classification scenario ─────────────────────────────────── + HOSPITAL_COUNT = 1000 # hospitals in deployment scope; spec + INF_PER_DAY = 100_000 # inferences per hospital per day; spec + P99_SLO_MS = 200 # P99 latency SLO, milliseconds; spec + ACCURACY_TARGET = 0.95 # ≥95% accuracy; HIPAA-grade clinical requirement + DP_EPS_LIMIT = 1.0 # ε ≤ 1 for HIPAA differential privacy; spec + ADV_ROBUSTNESS_TARGET = 0.50 # ≥50% accuracy under PGD attack; spec + CARBON_REDUCTION_TARGET = 0.40 # >40% reduction vs baseline; spec + UPTIME_TARGET = 0.999 # 99.9% uptime; spec + BUDGET_GPUS = 10_000 # H100 GPU budget; spec return ( mo, ledger, COLORS, LAB_CSS, apply_plotly_theme, go, np, math, H100_BW_GBS, H100_TFLOPS_FP16, H100_RAM_GB, H100_TDP_W, - H100_COST_USD, H100_CLOUD_HR, H100_MTBF_HOURS, - ORIN_BW_GBS, ORIN_RAM_GB, ORIN_TDP_W, - MOBILE_BW_GBS, MOBILE_RAM_GB, MOBILE_TDP_W, - MCU_SRAM_KB, MCU_BW_GBS, - IB_HDR200_BW_GBS, NVLINK4_BW_GBS, - USERS_SCALE, - BYTES_PER_PARAM_FULL, BYTES_PER_PARAM_BF16, - EU_GRID_CARBON_G_KWH, RENEW_CARBON_G_KWH, CARBON_THRESHOLD_G_KWH, + FLEET_SIZE_NODES, GPUS_PER_NODE, CHECKPOINT_COST_S, MTBF_GPU_HOURS, + NVLINK_BW_GBS, IB_BW_GBPS, + COAL_CI_G_KWH, RENEW_CI_G_KWH, BASELINE_CI_G_KWH, + HOSPITAL_COUNT, INF_PER_DAY, P99_SLO_MS, ACCURACY_TARGET, + DP_EPS_LIMIT, ADV_ROBUSTNESS_TARGET, CARBON_REDUCTION_TARGET, + UPTIME_TARGET, BUDGET_GPUS, ) @@ -129,31 +111,32 @@ def _(mo, LAB_CSS, COLORS): border: 1px solid rgba(99,102,241,0.2);">
- Machine Learning Systems · Volume II · Lab 17 · Capstone + Machine Learning Systems · Volume II · Lab 17 · Capstone

The Constraints Never Lie

- You have traversed 33 labs across two volumes. Every insight reduces - to the same principle: constraints drive - architecture. The memory wall, the interconnect wall, Amdahl's Law, - Young-Daly, Little's Law, Chouldechova — none can be wished away. - Your final task: architect a planet-scale AI system that must satisfy - all of them simultaneously. + max-width: 700px; line-height: 1.7;"> + You have traversed two volumes of ML systems. Every invariant you encountered + reduces to one meta-principle: constraints + drive architecture. The memory wall, Amdahl’s Law, Young-Daly, + Little’s Law, Chouldechova, the DP ε-δ tradeoff, + adversarial robustness — none can be wished away. Only navigated. + Your final task: audit your own journey, then architect a production system + that must satisfy all of them simultaneously.

- Act I: Complete Systems Map · Act II: Planet-Scale Architecture + Act I: Design Ledger Archaeology · Act II: Medical Fleet Architecture - 35–40 min + 35–40 min - 5 Active Failure States + 6 Active Constraint Scorecards
- Cloud -
H100 · 80 GB · 3.35 TB/s
+ Memory +
D·A·M Triad · Roofline
- Edge -
Orin NX · 16 GB · 102 GB/s
+ Scale +
Amdahl · Parallelism Paradox
- Mobile -
NPU · 8 GB · 68 GB/s
+ Reliability +
Young-Daly · Little’s Law
- TinyML -
Cortex-M7 · 256 KB · 0.05 GB/s
+ Ethics +
Chouldechova · DP ε-δ
@@ -205,8 +188,8 @@ def _(mo): all two-volume invariants; why physical laws cannot be abstracted away - **@sec-conclusion-vol1-synthesis** — Summary of the 8 invariant families from Volume I and how they compose at scale - - **@sec-conclusion-vol2-synthesis** — Summary of distributed systems invariants; - the emergent constraints that only appear at fleet scale + - **@sec-conclusion-vol2-synthesis** — Distributed systems invariants; the emergent + constraints that only appear at fleet scale - **@sec-conclusion-planet-scale** — Case study of hyperscaler architectural decisions viewed through the lens of competing constraints """), kind="info") @@ -214,7 +197,7 @@ def _(mo): # ═══════════════════════════════════════════════════════════════════════════════ -# ACT I — THE COMPLETE SYSTEMS MAP +# ACT I — DESIGN LEDGER ARCHAEOLOGY # ═══════════════════════════════════════════════════════════════════════════════ @@ -223,7 +206,7 @@ def _(mo): def _(mo): mo.md(""" --- - ## Act I — The Complete Systems Map + ## Act I — Design Ledger Archaeology *Calibration · 12-15 minutes* """) return @@ -239,14 +222,13 @@ def _(COLORS, mo): border-radius: 0 10px 10px 0; padding: 16px 22px; margin: 12px 0;">
- Incoming Message · Chief Systems Architect + Incoming Message · Chief Architect · AI Infrastructure
- "You have just completed 33 interactive labs spanning two volumes of ML systems - content. Before you architect the planet-scale system in Act II, you need to - understand your own intuition. Where did your mental models hold? - Where did the physics surprise you? Your Design Ledger tells the story. - Read it before you pick up the pen." + "You have spent two volumes designing ML systems. Before I promote you to + Principal Engineer, I need you to audit the constraints YOU violated during + training. Pull your Design Ledger and tell me: which constraints appeared most + frequently, and what architectural pattern would have prevented the most failures?"
""") @@ -257,23 +239,22 @@ def _(COLORS, mo): @app.cell(hide_code=True) def _(mo): mo.md(""" - The two-volume curriculum introduced eight families of physical invariants. - Each family corresponds to a wall or ceiling that your architecture must navigate: + The two-volume curriculum introduced invariant families spanning every layer of the ML + systems stack. The table below maps each domain to the labs where it was tested: - | Domain | Core Invariant | Labs | - |---|---|---| - | **Memory** | Memory wall: bandwidth ≪ compute peak | V1: 05, 08, 10 | - | **Compute** | Roofline / MFU ceiling | V1: 11, 12 | - | **Serving** | Little's Law: N = λW; P99 ≠ average | V1: 13, V2: 10 | - | **Networking** | AllReduce bandwidth wall; bisection BW | V2: 02, 03, 06 | - | **Reliability** | Young-Daly: T\\* = sqrt(2C / λ) | V2: 07 | - | **Scale** | Amdahl's ceiling; parallelism paradox | V2: 01, 05 | - | **Ethics** | Chouldechova impossibility; DP ε-accuracy | V1: 15, 16; V2: 16 | - | **Economics** | Jevons paradox; utilization vs. latency | V2: 08, 09, 15 | + | Domain | Core Invariant | Vol 1 Labs | Vol 2 Labs | + |---|---|---|---| + | **Memory** | Memory Wall: bandwidth ≪ compute peak | 05, 08, 10 | — | + | **Compute** | Roofline / MFU ceiling | 11, 12 | 09 | + | **Serving** | Little’s Law: N = λW; P99 ≠ avg | 13 | 10 | + | **Scale** | Amdahl; Parallelism Paradox | — | 01, 05 | + | **Networking** | AllReduce BW; Bisection BW | — | 02, 03, 06 | + | **Reliability** | Young-Daly: T* = √(2C/λ) | — | 07 | + | **Privacy & Ethics** | Chouldechova impossibility; DP ε-accuracy | 15, 16 | 13, 16 | + | **Economics** | Jevons Paradox; utilization vs. queue latency | — | 08, 09, 15 | - The radar chart below is your *Systems Intuition Report*. It reflects the eight - domains where the curriculum tested your predictions. Before seeing the chart, - commit to a hypothesis about your own blind spots. + The bar chart below is your *constraint frequency report* — how often your design choices + triggered a failure state in each domain. Before seeing it, commit to a hypothesis. """) return @@ -289,12 +270,12 @@ def _(mo): def _(mo): act1_pred = mo.ui.radio( options={ - "A) Memory hierarchy effects — I underestimated how bandwidth-bound systems are": "A", - "B) Communication overhead at scale — AllReduce and network costs surprised me most": "B", - "C) Tail effects — P99 vs. average and cascade failures were the hardest to internalize": "C", - "D) Fundamental impossibility theorems — Chouldechova, Amdahl ceilings felt unreachable": "D", + "A) Memory bandwidth was the most common constraint across my labs": "A", + "B) Parallelism communication overhead was the most common constraint": "B", + "C) Power and thermal constraints dominated at fleet scale": "C", + "D) The constraint varied — no single constraint dominates; it depends on scale": "D", }, - label="Which category of invariants did you find MOST counterintuitive across both volumes?", + label="Which constraint category appeared most frequently in your Design Ledger?", ) act1_pred return (act1_pred,) @@ -305,7 +286,7 @@ def _(act1_pred, mo): mo.stop( act1_pred.value is None, mo.callout( - mo.md("Select your prediction above to unlock the Act I Systems Map."), + mo.md("Select your prediction above to unlock the Design Ledger Archaeology."), kind="warn", ), ) @@ -316,7 +297,7 @@ def _(act1_pred, mo): # ─── ACT I: LEDGER ARCHAEOLOGY ──────────────────────────────────────────────── @app.cell(hide_code=True) def _(mo): - mo.md("### Systems Intuition Report — Design Ledger Archaeology") + mo.md("### Design Ledger Archaeology") return @@ -325,104 +306,147 @@ def _(COLORS, go, ledger, mo, np, apply_plotly_theme): # ── Read all prior ledger entries ───────────────────────────────────────── _history = ledger._state.history if hasattr(ledger._state, "history") else [] - # ── Build a chapter→design map for all 33 labs ──────────────────────────── + # Build chapter → design map _ledger_map = {} for _entry in _history: _ch = str(_entry.get("chapter", "")) _design = _entry.get("design", {}) _ledger_map[_ch] = _design - # ── Chapter membership per domain ───────────────────────────────────────── - # Maps domain → list of chapter keys to look up in ledger + # ── Domain → chapter membership ────────────────────────────────────────── + # Maps domain → chapter keys (Vol1 are plain integers as strings, Vol2 are "v2_NN") _domain_chapters = { "Memory": ["5", "8", "10"], - "Compute": ["11", "12"], + "Compute": ["11", "12", "v2_09"], "Serving": ["13", "v2_10"], + "Scale": ["v2_01", "v2_05"], "Networking": ["v2_02", "v2_03", "v2_06"], "Reliability": ["v2_07"], - "Scale": ["v2_01", "v2_05"], - "Ethics": ["15", "16", "v2_16"], + "Privacy/Ethics": ["15", "16", "v2_13", "v2_16"], "Economics": ["v2_08", "v2_09", "v2_15"], } - # ── Compute per-domain accuracy from ledger ──────────────────────────────── - # Each lab stores act1_correct: bool. Average across available labs in domain. - _domain_accuracy = {} - _domain_labs_done = {} - _domain_constraints_hit = {} + # ── Compute per-domain stats ─────────────────────────────────────────────── + _domain_constraint_hits = {} + _domain_accuracy = {} + _domain_labs_done = {} for _domain, _chapters in _domain_chapters.items(): + _hit_list = [] _correct_list = [] - _constraint_list = [] for _ch in _chapters: if _ch in _ledger_map: _d = _ledger_map[_ch] + if "constraint_hit" in _d: + _hit_list.append(1.0 if _d["constraint_hit"] else 0.0) if "act1_correct" in _d: _correct_list.append(1.0 if _d["act1_correct"] else 0.0) - if "constraint_hit" in _d: - _constraint_list.append(1.0 if _d["constraint_hit"] else 0.0) + _domain_constraint_hits[_domain] = ( + sum(_hit_list) / len(_hit_list) if _hit_list else 0.0 + ) _domain_accuracy[_domain] = ( sum(_correct_list) / len(_correct_list) if _correct_list else 0.5 ) _domain_labs_done[_domain] = len([c for c in _chapters if c in _ledger_map]) - _domain_constraints_hit[_domain] = ( - sum(_constraint_list) / len(_constraint_list) if _constraint_list else 0.0 - ) # ── Summary statistics ──────────────────────────────────────────────────── - _total_labs = len(_history) - _vol1_correct = [ - 1.0 if _ledger_map.get(str(c), {}).get("act1_correct", False) else 0.0 - for c in range(1, 17) if str(c) in _ledger_map - ] - _vol2_correct = [ - 1.0 if _ledger_map.get(f"v2_{c:02d}", {}).get("act1_correct", False) else 0.0 - for c in range(1, 17) if f"v2_{c:02d}" in _ledger_map - ] - _vol1_acc = sum(_vol1_correct) / max(len(_vol1_correct), 1) * 100 - _vol2_acc = sum(_vol2_correct) / max(len(_vol2_correct), 1) * 100 - _all_correct = _vol1_correct + _vol2_correct - _overall_acc = sum(_all_correct) / max(len(_all_correct), 1) * 100 - - # Weakest domain: lowest accuracy score - _weakest_domain = min(_domain_accuracy, key=lambda d: _domain_accuracy[d]) - _strongest_domain = max(_domain_accuracy, key=lambda d: _domain_accuracy[d]) - - # Most-triggered failure state domain - _most_failures_domain = max( - _domain_constraints_hit, key=lambda d: _domain_constraints_hit[d] + _total_labs = len(_history) + _total_hits = sum( + 1 for e in _history if e.get("design", {}).get("constraint_hit", False) ) - # ── Radar chart ─────────────────────────────────────────────────────────── - _domains = list(_domain_accuracy.keys()) - _scores = [_domain_accuracy[d] * 100 for d in _domains] - _scores_closed = _scores + [_scores[0]] # close the polygon - _theta_closed = _domains + [_domains[0]] + _vol1_chs = [str(c) for c in range(1, 17)] + _vol2_chs = [f"v2_{c:02d}" for c in range(1, 17)] - _fig = go.Figure() + _vol1_correct = [ + 1.0 if _ledger_map.get(c, {}).get("act1_correct", False) else 0.0 + for c in _vol1_chs if c in _ledger_map + ] + _vol2_correct = [ + 1.0 if _ledger_map.get(c, {}).get("act1_correct", False) else 0.0 + for c in _vol2_chs if c in _ledger_map + ] + _vol1_acc = sum(_vol1_correct) / max(len(_vol1_correct), 1) * 100 + _vol2_acc = sum(_vol2_correct) / max(len(_vol2_correct), 1) * 100 + _overall_acc = (sum(_vol1_correct) + sum(_vol2_correct)) / max( + len(_vol1_correct) + len(_vol2_correct), 1 + ) * 100 - # Ideal reference (100%) - _fig.add_trace(go.Scatterpolar( - r=[100] * (len(_domains) + 1), + _weakest_domain = min(_domain_accuracy, key=lambda d: _domain_accuracy[d]) + _most_hit_domain = max( + _domain_constraint_hits, key=lambda d: _domain_constraint_hits[d] + ) + + # Top 3 most-violated domains (by constraint_hit rate) + _sorted_domains = sorted( + _domain_constraint_hits.items(), key=lambda x: x[1], reverse=True + ) + _top3 = _sorted_domains[:3] + + # ── Horizontal bar chart: constraint hit frequency ──────────────────────── + _domains_sorted = [d for d, _ in sorted( + _domain_constraint_hits.items(), key=lambda x: x[1] + )] + _hits_sorted = [_domain_constraint_hits[d] * 100 for d in _domains_sorted] + _bar_colors = [ + COLORS["RedLine"] if h >= 60 else ( + COLORS["OrangeLine"] if h >= 30 else COLORS["GreenLine"] + ) + for h in _hits_sorted + ] + + _fig_bar = go.Figure(go.Bar( + x=_hits_sorted, + y=_domains_sorted, + orientation="h", + marker_color=_bar_colors, + text=[f"{h:.0f}%" for h in _hits_sorted], + textposition="outside", + textfont=dict(size=10, color=COLORS["TextSec"]), + )) + _fig_bar.update_layout( + height=320, + xaxis=dict( + title="Constraint hit rate (%)", + range=[0, 115], + gridcolor="#f1f5f9", + tickfont=dict(size=10), + ), + yaxis=dict(tickfont=dict(size=11, color=COLORS["TextSec"])), + margin=dict(t=50, b=40, l=130, r=60), + title=dict( + text="Constraint Hit Frequency by Domain (from your Design Ledger)", + font=dict(size=13, color=COLORS["Text"]), + x=0.5, + ), + ) + apply_plotly_theme(_fig_bar) + + # ── Radar chart: prediction accuracy by domain ──────────────────────────── + _radar_domains = list(_domain_accuracy.keys()) + _radar_scores = [_domain_accuracy[d] * 100 for d in _radar_domains] + _radar_closed = _radar_scores + [_radar_scores[0]] + _theta_closed = _radar_domains + [_radar_domains[0]] + + _fig_radar = go.Figure() + _fig_radar.add_trace(go.Scatterpolar( + r=[100] * (len(_radar_domains) + 1), theta=_theta_closed, fill="toself", fillcolor="rgba(99,102,241,0.06)", line=dict(color=COLORS["Cloud"], width=1, dash="dot"), name="Perfect (100%)", )) - - # Student scores - _fig.add_trace(go.Scatterpolar( - r=_scores_closed, + _fig_radar.add_trace(go.Scatterpolar( + r=_radar_closed, theta=_theta_closed, fill="toself", fillcolor="rgba(0,143,69,0.12)", line=dict(color=COLORS["GreenLine"], width=2.5), - name="Your accuracy", + name="Your prediction accuracy", marker=dict(size=8, color=COLORS["GreenLine"]), )) - - _fig.update_layout( + _fig_radar.update_layout( polar=dict( radialaxis=dict( visible=True, @@ -433,35 +457,38 @@ def _(COLORS, go, ledger, mo, np, apply_plotly_theme): tickfont=dict(size=9, color=COLORS["TextMuted"]), ), angularaxis=dict( - tickfont=dict(size=11, color=COLORS["TextSec"]), + tickfont=dict(size=10, color=COLORS["TextSec"]), gridcolor=COLORS["Border"], ), bgcolor="rgba(248,250,252,0.6)", ), showlegend=True, - legend=dict(orientation="h", yanchor="bottom", y=-0.15, xanchor="center", x=0.5), - height=440, - margin=dict(t=40, b=60, l=50, r=50), + legend=dict(orientation="h", yanchor="bottom", y=-0.18, xanchor="center", x=0.5), + height=400, + margin=dict(t=40, b=60, l=40, r=40), title=dict( text="Systems Intuition Radar — Prediction Accuracy by Domain", - font=dict(size=13, color=COLORS["Text"]), + font=dict(size=12, color=COLORS["Text"]), x=0.5, ), ) - apply_plotly_theme(_fig) + apply_plotly_theme(_fig_radar) # ── Summary metric cards ─────────────────────────────────────────────────── - _vol1_color = COLORS["GreenLine"] if _vol1_acc >= 70 else ( + _v1_color = ( + COLORS["GreenLine"] if _vol1_acc >= 70 else COLORS["OrangeLine"] if _vol1_acc >= 50 else COLORS["RedLine"] ) - _vol2_color = COLORS["GreenLine"] if _vol2_acc >= 70 else ( + _v2_color = ( + COLORS["GreenLine"] if _vol2_acc >= 70 else COLORS["OrangeLine"] if _vol2_acc >= 50 else COLORS["RedLine"] ) - _total_color = COLORS["GreenLine"] if _overall_acc >= 70 else ( + _ov_color = ( + COLORS["GreenLine"] if _overall_acc >= 70 else COLORS["OrangeLine"] if _overall_acc >= 50 else COLORS["RedLine"] ) - _summary_cards = mo.Html(f""" + _summary = mo.Html(f"""
of 33 total
+
+
+ Constraints Hit +
+
+ {_total_hits} +
+
+ of {_total_labs} labs +
+
Vol I Accuracy
-
+
{_vol1_acc:.0f}%
@@ -490,213 +529,265 @@ def _(COLORS, go, ledger, mo, np, apply_plotly_theme): text-transform: uppercase; letter-spacing: 0.1em; margin-bottom: 6px;"> Vol II Accuracy
-
+
{_vol2_acc:.0f}%
{len(_vol2_correct)} labs sampled
-
-
- Overall -
-
- {_overall_acc:.0f}% -
-
combined
-
- Weakest Domain + Most-Violated Domain
+ {_most_hit_domain} +
+
+ {_domain_constraint_hits[_most_hit_domain]*100:.0f}% of labs triggered +
+
+
+
+ Weakest Prediction Domain +
+
{_weakest_domain}
- {_domain_accuracy[_weakest_domain]*100:.0f}% accuracy + {_domain_accuracy[_weakest_domain]*100:.0f}% prediction accuracy
- Strongest Domain + Overall Accuracy
-
- {_strongest_domain} -
-
- {_domain_accuracy[_strongest_domain]*100:.0f}% accuracy -
-
-
-
- Most Failure States -
-
- {_most_failures_domain} -
-
- {_domain_constraints_hit[_most_failures_domain]*100:.0f}% of labs triggered +
+ {_overall_acc:.0f}%
+
combined
""") - mo.vstack([_summary_cards, mo.ui.plotly(_fig)]) + mo.vstack([ + _summary, + mo.hstack([ + mo.ui.plotly(_fig_bar), + mo.ui.plotly(_fig_radar), + ], justify="center", gap=2), + ]) return ( _domain_accuracy, - _domain_chapters, + _domain_constraint_hits, _domain_labs_done, - _domain_constraints_hit, + _most_hit_domain, _weakest_domain, - _strongest_domain, - _most_failures_domain, _overall_acc, _vol1_acc, _vol2_acc, _total_labs, + _total_hits, _ledger_map, + _top3, ) # ─── ACT I: PREDICTION REVEAL ───────────────────────────────────────────────── @app.cell(hide_code=True) def _( - COLORS, _domain_accuracy, - _weakest_domain, + _domain_constraint_hits, + _most_hit_domain, + _top3, act1_pred, mo, ): - # Map prediction option to corresponding domain key - _option_to_domain = { - "A": "Memory", - "B": "Networking", - "C": "Serving", - "D": "Ethics", - } - _predicted_domain = _option_to_domain.get(act1_pred.value, "Memory") - _actual_weakest = _weakest_domain + # D is the correct answer — the bottleneck shifts with scale + _correct = act1_pred.value == "D" - # Determine if prediction matched ledger-revealed weakness - _matches = _predicted_domain == _actual_weakest + _domain_top3_str = ", ".join(f"**{d}** ({r*100:.0f}%)" for d, r in _top3) - # Generate feedback for all four options — each substantive, no single "correct" _feedback_map = { "A": ( - f"**Memory hierarchy effects.** " - f"This is one of the most reliably counterintuitive invariants in all of computing. " - f"The H100's peak compute (1,979 TFLOPS FP16) outpaces its memory bandwidth " - f"(3,350 GB/s) by roughly 300 operations per byte. Most inference workloads " - f"never reach that peak — they stall on bandwidth. " - f"Your ledger shows your Memory domain accuracy at " - f"{_domain_accuracy['Memory']*100:.0f}%. " - f"If that surprised you, you're in good company: this is the wall that killed " - f"CPU-only ML and drove the entire GPU ecosystem." + "**Memory bandwidth** is among the most persistent constraints in single-node " + "and inference workloads. The H100's arithmetic intensity ridge point " + "(1,979 TFLOPS / 3,350 GB/s = ~591 FLOP/byte) means that most token-generation " + "workloads are bandwidth-bound at batch=1. You were correct that memory " + "dominates in many labs — but the ledger reveals it does not dominate " + "*all* labs. At fleet scale, network fabric and checkpoint overhead become " + "binding earlier. The constraint moves." ), "B": ( - f"**Communication overhead at scale.** " - f"AllReduce over InfiniBand (400 GB/s) looks fast until your gradient tensor " - f"is 280 GB (1T model × 2 bytes/param × 2× for BF16 accumulation). " - f"At 16,384 GPUs with ring AllReduce, each of the 2(N-1)/N ≈ 2 passes " - f"moves 280 GB across a fabric with shared bisection bandwidth. " - f"Your Networking accuracy: {_domain_accuracy['Networking']*100:.0f}%. " - f"The networking wall is the invariant that most surprises engineers who " - f"trained single-node before moving to multi-node — the compute is ready; " - f"the network is not." + "**Communication overhead** is genuinely severe at scale: ring AllReduce " + "over InfiniBand (400 Gb/s) carries a gradient tensor that can exceed 2 TB " + "for a 1T-parameter model. At 8,000+ GPUs, communication can consume " + "20-40% of total training time. But your ledger likely shows this only " + "became the dominant constraint in Vol 2 networking and distributed training " + "labs. In Vol 1 — single-node workloads — it barely registers. " + "The constraint moves with scale." ), "C": ( - f"**Tail effects.** " - f"P99 latency can be 10-100× the mean in a serving system under load. " - f"Little's Law (N = λW) tells you the average, but it says nothing about " - f"the tail. Cascade failures amplify: one slow node in a pipeline stage " - f"causes timeout retries upstream, which increases load on that node, " - f"which causes more timeouts. Your Serving accuracy: " - f"{_domain_accuracy['Serving']*100:.0f}%. " - f"The tail is the gap between your SLO contract and your monitoring dashboard." + "**Power and thermal constraints** bind at cluster level but are rarely the " + "first failure mode in individual lab scenarios. A 10,000-GPU cluster " + "draws 7 MW; carbon compliance is a real concern at fleet scale. " + "But your ledger likely shows thermal constraints appear primarily in the " + "sustainability labs, not across the full curriculum. The constraint moves " + "with the deployment tier." ), "D": ( - f"**Fundamental impossibility theorems.** " - f"Chouldechova's theorem states that when base rates differ across groups, " - f"no classifier can simultaneously equalize false positive rate, false negative " - f"rate, and calibration. This is not an engineering challenge — it is a " - f"mathematical constraint as immovable as Amdahl's Law. " - f"Your Ethics accuracy: {_domain_accuracy['Ethics']*100:.0f}%. " - f"The impossibility theorems are the domain where intuition fails most " - f"catastrophically, because they look like they should be solvable with " - f"more data or a better model. They are not." + "**Correct.** The constraint varies with scale, workload, and deployment tier. " + "Your ledger confirms this: the top three hit domains are " + f"{_domain_top3_str}. " + "Each was most relevant in a specific context. Memory dominates in " + "single-node inference. Communication dominates in multi-node training. " + "Fairness constraints activate regardless of scale but are invisible until " + "evaluated across populations. The meta-principle is not *which* constraint " + "is hardest — it is that the bottleneck *moves*, and the architect who " + "cannot see it move will be surprised by every system that scales." ), } - _chosen_feedback = _feedback_map.get(act1_pred.value, _feedback_map["A"]) + _chosen = _feedback_map.get(act1_pred.value, _feedback_map["D"]) - _match_note = "" - if _matches: - _match_note = ( - f" Your ledger confirms this: **{_actual_weakest}** is your weakest domain " - f"({_domain_accuracy[_actual_weakest]*100:.0f}% accuracy)." - ) - else: - _match_note = ( - f" Interestingly, your ledger shows your *actual* weakest domain is " - f"**{_actual_weakest}** ({_domain_accuracy[_actual_weakest]*100:.0f}% accuracy) — " - f"which means your intuition about your own intuition may also benefit from calibration." - ) + _note = ( + f" Your ledger also shows **{_most_hit_domain}** as your highest-hit domain " + f"({_domain_constraint_hits[_most_hit_domain]*100:.0f}% hit rate). " + f"Your weakest prediction accuracy was in **{min(_domain_accuracy, key=lambda d: _domain_accuracy[d])}** " + f"({_domain_accuracy[min(_domain_accuracy, key=lambda d: _domain_accuracy[d])]*100:.0f}%) — " + f"which is where you had the most to learn." + ) - _kind = "success" if _matches else "info" - mo.callout(mo.md(_chosen_feedback + _match_note), kind=_kind) + mo.callout( + mo.md(_chosen + _note), + kind="success" if _correct else "info", + ) return -# ─── ACT I: MATH PEEK ───────────────────────────────────────────────────────── +# ─── ACT I: REFLECTION ──────────────────────────────────────────────────────── +@app.cell(hide_code=True) +def _(mo): + mo.md("### Reflection") + return + + +@app.cell(hide_code=True) +def _(mo): + act1_reflection = mo.ui.radio( + options={ + "A) More hardware always solves the constraint — scale cures all bottlenecks": "A", + "B) Every system is defined by its most constrained resource — the laws don't change, but the bottleneck moves": "B", + "C) Software optimization is always preferable to hardware scaling": "C", + "D) The only invariant is that all constraints are temporary": "D", + }, + label="What architectural principle unifies ALL the invariants you encountered?", + ) + act1_reflection + return (act1_reflection,) + + +@app.cell(hide_code=True) +def _(act1_reflection, mo): + if act1_reflection.value is None: + mo.callout( + mo.md("Select your reflection answer above to continue."), + kind="warn", + ) + elif act1_reflection.value == "B": + mo.callout(mo.md( + "**Correct.** The bottleneck moves, but it never disappears. " + "The Iron Law T = D/BW + O/R + L tells you three things that can " + "limit latency. Roofline tells you two things that can limit compute. " + "Amdahl tells you the ceiling on parallelism. Young-Daly tells you " + "the optimal checkpoint interval. Chouldechova tells you the minimum " + "fairness gap you must accept. None of these is 'temporary.' They are " + "all expressions of the same underlying constraint: **physics drives architecture.**" + ), kind="success") + elif act1_reflection.value == "A": + mo.callout(mo.md( + "**Incorrect.** Adding hardware shifts the bottleneck but does not remove it. " + "Amdahl's Law shows that the serial fraction of your workload caps speedup " + "regardless of cluster size. The communication overhead of AllReduce *grows* " + "with cluster size. The carbon footprint *grows* with hardware count. " + "More hardware is a tool, not a solution." + ), kind="warn") + elif act1_reflection.value == "C": + mo.callout(mo.md( + "**Incorrect.** Software optimization is powerful — kernel fusion, " + "continuous batching, and mixed-precision training all improve MFU " + "substantially. But no software optimization escapes the Roofline " + "ceiling or removes Amdahl's serial fraction. At some point, " + "the physics imposes a hard limit that no optimizer can cross." + ), kind="warn") + else: + mo.callout(mo.md( + "**Incorrect.** Physical constraints are not temporary. " + "The memory wall is determined by signal physics and HBM pin density — " + "it has been 'temporary' for 30 years and remains. Chouldechova's " + "theorem follows from conditional probability and will not be repealed " + "by better hardware. Young-Daly follows from calculus. " + "The constraints are permanent; only your architecture adapts." + ), kind="warn") + return + + +# ─── ACT I: MATHPEEK ACCORDION ─────────────────────────────────────────────── @app.cell(hide_code=True) def _(mo): mo.accordion({ "The governing equations — all eight invariant families": mo.md(""" - **Memory Wall (Roofline):** - `Performance = min(peak_TFLOPS, BW_GBs × arithmetic_intensity)` - — From @sec-hw-acceleration-roofline + **Iron Law (Latency):** + `T = D/BW + O/R + L` + — T = latency; D = data transferred; BW = bandwidth; O = operations; R = throughput; L = pipeline latency + — Source: @sec-ml-systems-iron-law + + **Memory Anatomy (Training State):** + `M_total = weights + gradients + optimizer_state + activations` + — FP16 mixed precision: 2+2+8 = 12 bytes/param minimum; with activations varies by batch + — Source: @sec-training-memory-anatomy + + **Roofline (Attainable Performance):** + `Attainable_FLOPS = min(Peak_FLOPS, BW_GBs × Arithmetic_Intensity)` + — Ridge point = Peak_FLOPS / BW; below ridge = bandwidth-bound + — Source: @sec-hw-acceleration-roofline **Amdahl's Law (Scale Ceiling):** - `Speedup(N) = 1 / ((1 - p) + p/N)` where p is the parallelizable fraction - — Maximum speedup is bounded by serial fraction; source: @sec-distributed-training-amdahl - - **Young-Daly (Checkpoint Optimum):** - `T* = sqrt(2 × C / λ)` where C = checkpoint cost, λ = cluster failure rate - — Minimizes expected wasted time; source: @sec-fault-tolerance-young-daly + `Speedup(N) = 1 / (S + (1 - S)/N)` + — S = serial fraction; maximum speedup = 1/S regardless of N + — Source: @sec-distributed-training-amdahl **Little's Law (Serving Throughput):** - `N = λ × W` where N = in-flight requests, λ = arrival rate, W = latency - — Steady-state queueing identity; source: @sec-model-serving-littles-law + `L = lambda × W` + — L = in-flight requests; lambda = arrival rate; W = mean latency + — Source: @sec-model-serving-littles-law - **Differential Privacy (Accuracy-Privacy):** - `ε ≥ Δf / σ` where Δf = sensitivity, σ = noise scale - — Lower ε = stronger privacy; accuracy degrades as ε → 0; source: @sec-security-privacy-dp + **Young-Daly (Optimal Checkpoint Interval):** + `T* = sqrt(2 × C / lambda)` + — C = checkpoint write cost; lambda = cluster failure rate = N / MTBF_per_device + — Source: @sec-fault-tolerance-young-daly - **Chouldechova Impossibility (Fairness):** - When base rates differ: cannot simultaneously equalize FPR, FNR, and calibration - — Source: @sec-responsible-ai-chouldechova + **Jevons Paradox (Carbon):** + `Delta_C = Energy × Intensity × (scale_up - efficiency_gain)` + — Efficiency improvements can be consumed by demand growth; net carbon rises + — Source: @sec-sustainable-ai-jevons - **AllReduce Bandwidth (Ring):** - `t_allreduce = 2 × (N-1)/N × M / BW` where M = gradient size, BW = fabric bandwidth - — Source: @sec-collective-communication-ring-allreduce - - **Carbon-Aware Scheduling:** - `CO2 = Energy_kWh × carbon_intensity_g_kWh` - — Jevons Paradox: efficiency gains can be consumed by demand growth; source: @sec-sustainable-ai + **SLO Composition (Reliability):** + `P(e2e_failure) = 1 - product_i(p_i)` + — Approximate for independent services; cascade amplifies tail failures + — Source: @sec-ops-scale-slo-composition """) }) return # ═══════════════════════════════════════════════════════════════════════════════ -# ACT II — PLANET-SCALE ARCHITECTURE CHALLENGE +# ACT II — THE FINAL ARCHITECTURE CHALLENGE # ═══════════════════════════════════════════════════════════════════════════════ @@ -705,7 +796,7 @@ def _(mo): def _(mo): mo.md(""" --- - ## Act II — Planet-Scale Architecture Challenge + ## Act II — The Final Architecture Challenge *Design Challenge · 20-25 minutes* """) return @@ -721,25 +812,39 @@ def _(COLORS, mo): border-radius: 0 10px 10px 0; padding: 18px 24px; margin: 12px 0;">
- Incoming Message · Board of Directors · URGENT + Incoming Message · Chief Architect · Medical AI Division · URGENT
- "You have been appointed Chief ML Architect for a planetary-scale AI system. - Requirements: serve 5 billion users globally across cloud, edge, - mobile, and TinyML tiers; train a 1 trillion parameter foundation model - with monthly updates; maintain P99 < 500ms globally with - 99.99% availability; comply with GDPR (EU) and CCPA (California) - differential privacy requirements; achieve carbon-neutral by 2027; - ensure fair treatment across 193 UN member countries. - Your infrastructure budget is $10 billion. - You have five architectural decisions to make. Every decision must satisfy a - physical constraint. Some combinations are infeasible. Find one that is not." + "Design a production ML system for real-time medical image classification. + Requirements: 1,000 hospitals, 100,000 inferences/day + each, ≥95% accuracy, P99 < 200ms, + DP ε ≤ 1 (HIPAA), >40% carbon reduction + vs. baseline, fault tolerance for 99.9% uptime, + and adversarial robustness ≥50% on PGD attacks. + You have a budget of 10,000 H100s. + Every constraint must be satisfied simultaneously for deployment approval."
""") return +# ─── ACT II: CONTEXT TOGGLE ─────────────────────────────────────────────────── +@app.cell(hide_code=True) +def _(mo): + context_toggle = mo.ui.radio( + options={ + "Global Fleet (mixed grid, 386 g CO\u2082/kWh)": "fleet", + "Carbon-Optimized (renewable, 40 g CO\u2082/kWh)": "renewable", + }, + value="Global Fleet (mixed grid, 386 g CO\u2082/kWh)", + label="Deployment context:", + inline=True, + ) + context_toggle + return (context_toggle,) + + # ─── ACT II: PREDICTION LOCK ────────────────────────────────────────────────── @app.cell(hide_code=True) def _(mo): @@ -751,12 +856,12 @@ def _(mo): def _(mo): act2_pred = mo.ui.radio( options={ - "A) Training is the binding constraint — the 1T model OOMs on any realistic cluster": "A", - "B) Serving is the binding constraint — P99 < 500ms at 5B users is physically unreachable": "B", - "C) Privacy is the binding constraint — GDPR-grade DP destroys too much model accuracy": "C", - "D) All constraints can be satisfied simultaneously with correct architectural choices": "D", + "A) DP \u03b5 \u2264 1 is the hardest constraint — it destroys too much accuracy for clinical use": "A", + "B) No single architecture satisfies all constraints simultaneously — requires explicit tradeoff negotiation": "B", + "C) The fleet size (10,000 H100s) is sufficient for all constraints at stated scale": "C", + "D) Carbon reduction is the easiest constraint to satisfy independently of the others": "D", }, - label="Before configuring the system: which constraint will be hardest to satisfy at 5B-user scale?", + label="Which statement best characterizes this architecture challenge?", ) act2_pred return (act2_pred,) @@ -775,1300 +880,865 @@ def _(act2_pred, mo): return -# ─── ACT II: DECISION 1 — TRAINING INFRASTRUCTURE ──────────────────────────── +# ─── ACT II: ARCHITECTURE SYNTHESIZER — SLIDERS ─────────────────────────────── @app.cell(hide_code=True) def _(mo): - mo.md("### Decision 1 — Training Infrastructure") + mo.md("### Final Architecture Synthesizer") return @app.cell(hide_code=True) def _(mo): - d1_gpu_count = mo.ui.slider( - start=1024, stop=65536, value=16384, step=1024, - label="GPU cluster size (H100 count)", + model_size_b = mo.ui.slider( + start=1, stop=70, value=7, step=1, + label="Model size (B parameters)", show_value=True, ) - d1_parallelism = mo.ui.dropdown( + dp_epsilon = mo.ui.slider( + start=0.1, stop=10.0, value=1.0, step=0.1, + label="Differential privacy \u03b5 (lower = stronger privacy)", + show_value=True, + ) + adv_train_weight = mo.ui.slider( + start=0.0, stop=1.0, value=0.3, step=0.05, + label="Adversarial training weight (0 = clean only, 1 = adversarial only)", + show_value=True, + ) + parallelism_strategy = mo.ui.radio( options={ "Data Parallel only (DP)": "dp", - "Tensor + Pipeline Parallel (TP+PP)": "tp_pp", - "Full 3D Parallelism (DP+TP+PP)": "3d", - "Expert Parallelism (MoE)": "moe", + "Tensor + Data Parallel (TP+DP)": "tp_dp", + "Full 3D Parallel (DP+TP+PP)": "3d", }, - value="Full 3D Parallelism (DP+TP+PP)", - label="Parallelism strategy", + value="Tensor + Data Parallel (TP+DP)", + label="Parallelism strategy:", + inline=True, ) - d1_mfu = mo.ui.slider( - start=20, stop=60, value=40, step=5, - label="Expected MFU % (Model FLOP Utilization)", - show_value=True, - ) - mo.vstack([ - mo.hstack([d1_gpu_count, d1_parallelism], justify="start", gap=4), - d1_mfu, - ]) - return (d1_gpu_count, d1_parallelism, d1_mfu) - - -@app.cell(hide_code=True) -def _( - COLORS, - BYTES_PER_PARAM_FULL, - H100_CLOUD_HR, - H100_RAM_GB, - H100_TFLOPS_FP16, - H100_TDP_W, - NVLINK4_BW_GBS, - IB_HDR200_BW_GBS, - d1_gpu_count, - d1_mfu, - d1_parallelism, - mo, - math, -): - _N = d1_gpu_count.value - _mfu_frac = d1_mfu.value / 100.0 - _strategy = d1_parallelism.value - - # ── 1T model memory footprint (full training state) ──────────────────────── - # 1T params × 20 bytes/param = 20 TB total state - # Source: @sec-training-memory-anatomy — weights(2) + grads(2) + Adam(8) + BF16(2) = 20 bytes - _params = 1e12 - _total_state_bytes = _params * BYTES_PER_PARAM_FULL - _total_state_tb = _total_state_bytes / 1e12 - - # Memory required per GPU depends on sharding strategy - _sharding_factor = { - "dp": 1.0, # no sharding → needs full model per replica - "tp_pp": 32.0, # TP=8 × PP=4 splits → 32× reduction - "3d": 64.0, # DP × TP × PP; assume DP=8, TP=8, PP=8 = 512 total - "moe": 16.0, # MoE: active params per expert shard - }.get(_strategy, 32.0) - - _mem_per_gpu_tb = _total_state_tb / _sharding_factor - _mem_per_gpu_gb = _mem_per_gpu_tb * 1000.0 - - # OOM check: per-GPU required > H100_RAM_GB - _oom = _mem_per_gpu_gb > H100_RAM_GB - - # ── Training throughput (tokens/day) ────────────────────────────────────── - # Effective TFLOPS per GPU = peak × MFU - # Tokens/step = 2 × params × seq_len (FLOPs per forward, assuming 2048 tokens) - # Source: @sec-nn-computation-flop-counting — FLOPs ≈ 6 × P for a full train step - _effective_tflops = H100_TFLOPS_FP16 * _mfu_frac # per GPU - _flops_per_token = 6.0 * _params / 1e12 # TFLOPS needed for 1 token step - _tokens_per_sec_per_gpu = _effective_tflops / _flops_per_token - _tokens_per_day_total = _tokens_per_sec_per_gpu * _N * 86400 - - # Communication overhead: AllReduce gradient tensor - # Gradient size = 2 bytes/param × 1T = 2 TB - # Ring AllReduce time = 2(N-1)/N × gradient_size / BW - _gradient_gb = _params * 2 / 1e9 # BF16 gradients, GB - _fabric_bw = IB_HDR200_BW_GBS # fallback: IB HDR200 400 GB/s - _ring_time_s = 2.0 * (_N - 1) / _N * _gradient_gb / _fabric_bw - _step_compute_s = _flops_per_token * 2048 / _effective_tflops # one step - _comm_overhead_pct = (_ring_time_s / (_step_compute_s + _ring_time_s)) * 100.0 - - # ── Training cost (cloud on-demand) ─────────────────────────────────────── - # Monthly update = 1 trillion tokens (GPT-3 class data budget × 3) - # Source: @sec-vol2-introduction-training-scale - _tokens_target = 1e12 - _days_to_train = _tokens_target / (_tokens_per_day_total + 1e-9) - _gpu_hours = _days_to_train * 24 * _N - _train_cost_m = _gpu_hours * H100_CLOUD_HR / 1e6 # millions $ - - # ── Power (training cluster) ─────────────────────────────────────────────── - _cluster_power_mw = _N * H100_TDP_W / 1e6 # Megawatts - - # ── Color coding ────────────────────────────────────────────────────────── - _mem_color = COLORS["RedLine"] if _oom else COLORS["GreenLine"] - _comm_color = ( - COLORS["GreenLine"] if _comm_overhead_pct < 15 else - COLORS["OrangeLine"] if _comm_overhead_pct < 30 else - COLORS["RedLine"] - ) - _cost_color = COLORS["GreenLine"] if _train_cost_m < 500 else ( - COLORS["OrangeLine"] if _train_cost_m < 1000 else COLORS["RedLine"] - ) - - # ── Physics formula display ──────────────────────────────────────────────── - _formula = mo.Html(f""" -
-
- Training Physics — 1T Parameter Model -
-
-
Total state = 1T params × {BYTES_PER_PARAM_FULL} bytes/param - = {_total_state_tb:.0f} TB -
-
Per-GPU memory = {_total_state_tb:.0f} TB / {_sharding_factor:.0f} - (sharding) = - {_mem_per_gpu_gb:.1f} GB -  {'❌ OOM — exceeds 80 GB H100' - if _oom else - '✓ fits in H100 HBM'} -
-
Effective throughput = {H100_TFLOPS_FP16} × {_mfu_frac:.2f} MFU - × {_N:,} GPUs = {_tokens_per_day_total/1e9:.1f}B tokens/day -
-
Days to train 1T tokens = {_days_to_train:.1f} days
-
AllReduce overhead = - {_comm_overhead_pct:.1f}% of step time -
-
Training cost = - ${_train_cost_m:.0f}M (cloud on-demand per training run) -
-
Cluster power = {_cluster_power_mw:.1f} MW
-
-
- """) - - _oom_banner = None - if _oom: - _oom_banner = mo.callout(mo.md( - f"**OOM — Infeasible.** With `{_strategy}` sharding, each GPU requires " - f"**{_mem_per_gpu_gb:.1f} GB** but H100 HBM is only **{H100_RAM_GB} GB**. " - f"The 1T model's full training state is {_total_state_tb:.0f} TB. " - f"Increase sharding (try **3D Parallelism** with more GPUs) or the model " - f"will not fit. This is not a software problem — it is a memory wall constraint." - ), kind="danger") - - if _oom_banner: - mo.vstack([_formula, _oom_banner]) - else: - _formula - return ( - _oom, - _train_cost_m, - _days_to_train, - _tokens_per_day_total, - _comm_overhead_pct, - _cluster_power_mw, - _N, - _strategy, - ) - - -# ─── ACT II: DECISION 2 — INFERENCE SERVING ────────────────────────────────── -@app.cell(hide_code=True) -def _(mo): - mo.md("### Decision 2 — Inference Serving") - return - - -@app.cell(hide_code=True) -def _(mo): - d2_replicas = mo.ui.slider( - start=100, stop=20000, value=5000, step=100, - label="Cloud inference replica count (H100s)", - show_value=True, - ) - d2_quant = mo.ui.dropdown( - options={ - "FP16 (full precision)": "fp16", - "INT8 (8-bit quantization)": "int8", - "INT4 (4-bit quantization)": "int4", - "1-bit (extreme compression)": "1bit", - }, - value="INT8 (8-bit quantization)", - label="Cloud tier quantization", - ) - d2_edge_tier = mo.ui.dropdown( - options={ - "None — cloud only": "none", - "Edge (Orin NX, INT4)": "edge", - "Edge + Mobile (INT4 + INT2)": "edge_mobile", - }, - value="Edge + Mobile (INT4 + INT2)", - label="Edge/mobile offload tier", - ) - mo.hstack([d2_replicas, d2_quant, d2_edge_tier], justify="start", gap=4) - return (d2_replicas, d2_quant, d2_edge_tier) - - -@app.cell(hide_code=True) -def _( - COLORS, - BYTES_PER_PARAM_BF16, - H100_BW_GBS, - H100_CLOUD_HR, - H100_RAM_GB, - USERS_SCALE, - d2_edge_tier, - d2_quant, - d2_replicas, - mo, - math, -): - _R = d2_replicas.value - _quant = d2_quant.value - _tier = d2_edge_tier.value - - # ── Bytes per parameter for each quantization level ─────────────────────── - # Source: @sec-model-compression-quantization - _bytes_per_param = { - "fp16": 2.0, - "int8": 1.0, - "int4": 0.5, - "1bit": 0.125, - }.get(_quant, 1.0) - - _model_size_gb = 1e12 * _bytes_per_param / 1e9 # 1T params - _fits_h100 = _model_size_gb <= H100_RAM_GB - - # For INT8 and below, assume sharded across N_shard H100s - _n_shard = math.ceil(_model_size_gb / H100_RAM_GB) - _n_shard = max(_n_shard, 1) - - # ── Per-replica throughput via arithmetic intensity (roofline) ───────────── - # Decoding: 1 token per step, 2 × model_params FLOPs per token - # Arithmetic intensity = 2P / (2P bytes) = 1 op/byte for batch=1 - # → memory-bandwidth bound → throughput = BW / bytes_per_param / params - # Source: @sec-inference-roofline-decode - _decode_toks_per_sec = H100_BW_GBS * 1e9 / (1e12 * _bytes_per_param) - _decode_toks_per_sec_total = _decode_toks_per_sec * _R - - # ── Steady-state users via Little's Law ─────────────────────────────────── - # N = λ × W → λ_max = N_in_flight / W - # Assume mean latency W = 100 tokens / decode_rate - # Assume 100-token response, 1 concurrent request per H100 shard - _tokens_per_response = 100 - _latency_s = _tokens_per_response / (_decode_toks_per_sec + 1e-9) - _rps_total = _R / _n_shard / (_latency_s + 1e-9) # requests per second - - # P99 estimate: Kingman's formula M/M/c → P99 ≈ avg_latency × log(100×(1-ρ))^-1 - # Simplified: assume P99 ≈ 3× avg for moderate utilization - _p99_ms = _latency_s * 1000 * 3.0 # P99 in ms - - # SLO check: P99 < 500ms - _slo_ok = _p99_ms < 500.0 - - # ── Daily serving cost ──────────────────────────────────────────────────── - _serving_cost_day_m = _R * H100_CLOUD_HR * 24 / 1e6 # millions $/day - _serving_cost_yr_b = _serving_cost_day_m * 365 / 1000 # billion $/year - - # ── Daily concurrent user capacity at P99 SLO ───────────────────────────── - # 5B users, assume peak 10% concurrent = 500M simultaneous - _peak_concurrent = USERS_SCALE * 0.10 - _capacity_ok = _rps_total * _latency_s >= _peak_concurrent - - # ── Color coding ────────────────────────────────────────────────────────── - _slo_color = COLORS["GreenLine"] if _slo_ok else COLORS["RedLine"] - _capacity_color = COLORS["GreenLine"] if _capacity_ok else COLORS["RedLine"] - _cost_color = COLORS["GreenLine"] if _serving_cost_yr_b < 5 else ( - COLORS["OrangeLine"] if _serving_cost_yr_b < 8 else COLORS["RedLine"] - ) - - _formula = mo.Html(f""" -
-
- Serving Physics — Little's Law + Roofline -
-
-
Model size ({_quant}) = 1T × {_bytes_per_param} bytes - = {_model_size_gb:.0f} GB -  ({'requires ' + str(_n_shard) + ' H100 shards/replica'}) -
-
Decode rate = BW / bytes_per_param / params - = {H100_BW_GBS}e9 / ({_bytes_per_param} × 1e12) - = {_decode_toks_per_sec:.2f} tok/s/GPU -
-
Avg latency (100 tok) = 100 / {_decode_toks_per_sec:.2f} - = {_latency_s*1000:.0f} ms -
-
P99 latency (est. 3× avg) - = {_p99_ms:.0f} ms -  {'✓ < 500ms SLO' if _slo_ok else '❌ EXCEEDS 500ms SLO'} -
-
Total RPS = {_R:,} replicas / {_n_shard} shards / {_latency_s:.3f}s - = {_rps_total:,.0f} req/s -
-
Concurrent users supported (N=λW) - = - {_rps_total * _latency_s:,.0f} -  ({'✓ ≥ 500M peak' if _capacity_ok else '❌ below 500M peak'}) -
-
Annual serving cost - = ${_serving_cost_yr_b:.2f}B/yr -
-
-
- """) - - _banners = [] - if not _slo_ok: - _banners.append(mo.callout(mo.md( - f"**P99 SLO Violation.** Estimated P99 latency is **{_p99_ms:.0f} ms**, " - f"exceeding the 500ms global SLO. " - f"The {_quant} model decodes at only {_decode_toks_per_sec:.2f} tok/s/GPU " - f"— bandwidth-bound, not compute-bound (arithmetic intensity = 1 op/byte). " - f"Options: increase replicas, use a smaller distilled model per tier, " - f"or shift load to edge/mobile (which reduces cloud P99 tail)." - ), kind="danger")) - - if not _capacity_ok: - _banners.append(mo.callout(mo.md( - f"**Capacity Insufficient.** Your system handles " - f"~{_rps_total * _latency_s:,.0f} concurrent users but " - f"peak demand is {int(USERS_SCALE * 0.10):,} (10% of 5B). " - f"Add replicas or offload a larger fraction of requests to edge/mobile tiers." - ), kind="warn")) - - if _banners: - mo.vstack([_formula] + _banners) - else: - _formula - return ( - _slo_ok, - _capacity_ok, - _p99_ms, - _rps_total, - _serving_cost_yr_b, - _model_size_gb, - _n_shard, - _quant, - _tier, - ) - - -# ─── ACT II: DECISION 3 — FAULT TOLERANCE ──────────────────────────────────── -@app.cell(hide_code=True) -def _(mo): - mo.md("### Decision 3 — Fault Tolerance") - return - - -@app.cell(hide_code=True) -def _(mo): - d3_ckpt_interval = mo.ui.slider( + checkpoint_interval_min = mo.ui.slider( start=5, stop=120, value=30, step=5, label="Checkpoint interval (minutes)", show_value=True, ) - d3_replication = mo.ui.dropdown( - options={ - "No replication (single copy)": "1", - "2× replication": "2", - "3× replication (standard)": "3", - "5× replication (high-value)": "5", - }, - value="3× replication (standard)", - label="Checkpoint storage replication factor", - ) - mo.hstack([d3_ckpt_interval, d3_replication], justify="start", gap=4) - return (d3_ckpt_interval, d3_replication) - - -@app.cell(hide_code=True) -def _( - COLORS, - H100_MTBF_HOURS, - _N, - d3_ckpt_interval, - d3_replication, - mo, - math, -): - _T_min = d3_ckpt_interval.value - _T_hr = _T_min / 60.0 - _rep = int(d3_replication.value) - - # ── Cluster-level failure rate ───────────────────────────────────────────── - # lambda_cluster = N / MTBF_per_GPU [independent failures] - # Source: @sec-fault-tolerance-failure-modes - _lambda = _N / H100_MTBF_HOURS # failures/hour - _cluster_mtbf_hr = 1.0 / _lambda - _cluster_mtbf_min = _cluster_mtbf_hr * 60.0 - - # ── Young-Daly optimal checkpoint interval ───────────────────────────────── - # T* = sqrt(2 × C / lambda) [source: @sec-fault-tolerance-young-daly] - # Checkpoint cost C: 1T model at 2 bytes/param = 2 TB; Lustre 400 GB/s - # C = 2000 GB / 400 GB/s = 5 seconds → 0.083 minutes - _ckpt_size_gb = 1e12 * 2 / 1e9 # 2 TB for BF16 weights - _lustre_bw_gbs = 400.0 # GB/s aggregate; @sec-fault-tolerance - _C_s = _ckpt_size_gb / _lustre_bw_gbs - _C_hr = _C_s / 3600.0 - _C_min = _C_s / 60.0 - - _T_opt_hr = math.sqrt(2.0 * _C_hr / _lambda) - _T_opt_min = _T_opt_hr * 60.0 - - # ── Overhead and expected waste ──────────────────────────────────────────── - _overhead_pct = (_C_hr / _T_hr) * 100.0 - _waste_per_failure_min = _T_min / 2.0 + _C_min - _expected_waste_rate = _lambda * (_T_hr / 2.0 + _C_hr) # fraction of time - - # Overhead ceiling: if checkpoint overhead > 20% → critical - _overhead_ok = _overhead_pct < 20.0 - - # ── 99.99% availability calculation ─────────────────────────────────────── - # Availability = 1 - downtime_fraction - # Downtime per failure ≈ T/2 + C + restart_time (assume restart = 30 min) - _restart_hr = 0.5 # 30 min restart - _downtime_per_failure_hr = _T_hr / 2.0 + _C_hr + _restart_hr - _downtime_fraction = _lambda * _downtime_per_failure_hr - _availability_pct = (1.0 - _downtime_fraction) * 100.0 - _avail_ok = _availability_pct >= 99.99 - - # ── Color coding ────────────────────────────────────────────────────────── - _T_ratio = _T_min / max(_T_opt_min, 0.01) - _int_color = ( - COLORS["GreenLine"] if 0.7 <= _T_ratio <= 1.4 else - COLORS["OrangeLine"] if _T_ratio < 0.7 else - COLORS["RedLine"] - ) - _ovh_color = ( - COLORS["GreenLine"] if _overhead_pct < 10 else - COLORS["OrangeLine"] if _overhead_pct < 20 else - COLORS["RedLine"] - ) - _avail_color = COLORS["GreenLine"] if _avail_ok else COLORS["RedLine"] - - _formula = mo.Html(f""" -
-
- Fault Tolerance Physics — Young-Daly + Availability -
-
-
Cluster λ = {_N:,} GPUs / {H100_MTBF_HOURS}hr per-GPU MTBF - = {_lambda:.2f} failures/hr -  (MTBF = {_cluster_mtbf_min:.1f} min) -
-
Checkpoint cost C = {_ckpt_size_gb:.0f} GB / {_lustre_bw_gbs:.0f} GB/s - = {_C_min:.1f} min -
-
Young-Daly T* = sqrt(2 × {_C_hr:.4f}hr / {_lambda:.4f}/hr) - = {_T_opt_min:.1f} min -
-
Your T = {_T_min} min -  ({_T_ratio:.1f}× {'too frequent' if _T_ratio < 0.7 else 'too infrequent' if _T_ratio > 1.4 else 'near-optimal'}) -   - {('near-optimal' if 0.7 <= _T_ratio <= 1.4 else 'suboptimal')} -
-
Checkpoint overhead = C/T = {_C_hr:.4f}/{_T_hr:.4f} - = {_overhead_pct:.1f}% -  {'❌ >20% ceiling' if not _overhead_ok else '✓ OK'} -
-
Expected waste rate = λ × (T/2 + C) - = {_expected_waste_rate*100:.1f}% of training time -
-
Availability = 1 - λ × downtime - = {_availability_pct:.4f}% -  {'✓ ≥ 99.99%' if _avail_ok else '❌ < 99.99% SLO'} -
-
-
- """) - - _banners = [] - if not _overhead_ok: - _banners.append(mo.callout(mo.md( - f"**Checkpoint Overhead Critical.** Your {_T_min}-minute interval " - f"with {_C_min:.1f}-minute checkpoint cost yields " - f"**{_overhead_pct:.1f}% overhead** — exceeding the 20% ceiling. " - f"Young-Daly optimal is **{_T_opt_min:.1f} minutes**. " - f"At cluster MTBF = {_cluster_mtbf_min:.0f} min ({_N:,} GPUs), " - f"checkpointing more frequently than T* costs more than it saves." - ), kind="danger")) - - if not _avail_ok: - _banners.append(mo.callout(mo.md( - f"**Availability Below 99.99%.** Current architecture delivers " - f"**{_availability_pct:.4f}%** availability. " - f"With cluster MTBF = {_cluster_mtbf_min:.0f} minutes and " - f"restart overhead of 30 minutes per failure, " - f"you cannot reach four-nines without checkpointing strategy optimization. " - f"Consider async multi-level checkpointing to reduce restart cost." - ), kind="warn")) - - if _banners: - mo.vstack([_formula] + _banners) - else: - _formula - return ( - _T_min, - _T_opt_min, - _overhead_ok, - _avail_ok, - _availability_pct, - _overhead_pct, - _C_min, - _lambda, - _cluster_mtbf_min, - _ckpt_size_gb, - ) - - -# ─── ACT II: DECISION 4 — PRIVACY ──────────────────────────────────────────── -@app.cell(hide_code=True) -def _(mo): - mo.md("### Decision 4 — Privacy") - return - - -@app.cell(hide_code=True) -def _(mo): - d4_epsilon = mo.ui.slider( - start=0.1, stop=10.0, value=1.0, step=0.1, - label="Differential privacy epsilon (ε) — lower = stronger privacy", - show_value=True, - ) - d4_strategy = mo.ui.dropdown( - options={ - "Centralized training (all data to cloud)": "central", - "Federated learning (EU region)": "federated_eu", - "Federated (EU) + Central (non-EU)": "hybrid", - "Full federated (all regions)": "full_federated", - }, - value="Federated (EU) + Central (non-EU)", - label="Data residency strategy", - ) - mo.hstack([d4_epsilon, d4_strategy], justify="start", gap=4) - return (d4_epsilon, d4_strategy) - - -@app.cell(hide_code=True) -def _( - COLORS, - d4_epsilon, - d4_strategy, - mo, - math, -): - _eps = d4_epsilon.value - _strat = d4_strategy.value - - # ── GDPR differential privacy compliance ────────────────────────────────── - # GDPR Art. 25 + EDPB guidance: epsilon ≤ 1.0 required for strong DP guarantee - # Source: @sec-security-privacy-dp-gdpr - _GDPR_EPS_MAX = 1.0 # ε ≤ 1.0 for GDPR-grade DP; @sec-security-privacy-dp - _CCPA_EPS_MAX = 3.0 # ε ≤ 3.0 for CCPA-grade; @sec-security-privacy-ccpa - - _gdpr_ok = _eps <= _GDPR_EPS_MAX - _ccpa_ok = _eps <= _CCPA_EPS_MAX - - # ── Accuracy degradation model from DP noise ─────────────────────────────── - # Approximate relationship: accuracy_penalty ≈ k / epsilon (diminishing returns) - # At ε=1.0: ~5% accuracy drop; at ε=0.1: ~15%; at ε=10: ~0.5% - # Source: @sec-security-privacy-dp-accuracy-tradeoff - _k_accuracy = 0.05 # empirical constant (5% penalty at ε=1) - _accuracy_penalty_pct = min(_k_accuracy / _eps * 100, 25.0) - - # ── Federated learning communication overhead ───────────────────────────── - # Federated: each round requires uploading model diff ≈ gradient size - # 1T model gradient at BF16 = 2 TB per round - # Mobile uplink ≈ 10 Mbps → 2 TB / 10 Mbps = 1.6M seconds → impractical - # Source: @sec-edge-intelligence-federated-communication - _is_federated = _strat in ("federated_eu", "hybrid", "full_federated") - _gradient_gb = 1e12 * 2 / 1e9 # 2000 GB - _mobile_uplink_gbps = 0.010 # 10 Mbps typical; @sec-edge-intelligence - _upload_time_s = _gradient_gb / _mobile_uplink_gbps - _upload_time_days = _upload_time_s / 86400 - - # Practical: federated sends only adapter diff (LoRA delta), not full gradient - # LoRA rank=16, 1T model → ~0.001% of model = ~2 GB - _lora_diff_gb = 2.0 - _lora_upload_s = _lora_diff_gb / _mobile_uplink_gbps - _lora_upload_min = _lora_upload_s / 60.0 - - # ── Color coding ────────────────────────────────────────────────────────── - _gdpr_color = COLORS["GreenLine"] if _gdpr_ok else COLORS["RedLine"] - _ccpa_color = COLORS["GreenLine"] if _ccpa_ok else COLORS["RedLine"] - _acc_color = ( - COLORS["GreenLine"] if _accuracy_penalty_pct < 5 else - COLORS["OrangeLine"] if _accuracy_penalty_pct < 12 else - COLORS["RedLine"] - ) - - _formula = mo.Html(f""" -
-
- Privacy Physics — Differential Privacy ε-δ Tradeoff -
-
-
DP noise scale: σ ∝ Δf / ε -  —  smaller ε = more noise added -
-
GDPR (ε ≤ {_GDPR_EPS_MAX}): -   - ε = {_eps:.1f}   - {'✓ GDPR-compliant' if _gdpr_ok else '❌ GDPR violation'} - -
-
CCPA (ε ≤ {_CCPA_EPS_MAX}): -   - {'✓ CCPA-compliant' if _ccpa_ok else '❌ CCPA violation'} - -
-
Estimated accuracy penalty ≈ k/ε - = 0.05/{_eps:.1f} - = ~{_accuracy_penalty_pct:.1f}% - degradation -
-
Data strategy: {_strat} -  {'(federated requires LoRA adapter diffs)' if _is_federated else ''} -
- {f'
LoRA diff upload (10 Mbps): {_lora_upload_min:.1f} min/round
' if _is_federated else ''} -
-
- """) - - _banners = [] - if not _gdpr_ok: - _banners.append(mo.callout(mo.md( - f"**GDPR Violation.** Your epsilon = **{_eps:.1f}** exceeds the GDPR-grade " - f"threshold of ε ≤ {_GDPR_EPS_MAX}. " - f"EU data regulators interpret ε > 1 as providing insufficient anonymization " - f"under the EDPB's differential privacy guidance. " - f"Reduce ε or migrate EU users to a federated strategy where raw data " - f"never leaves the device." - ), kind="danger")) - - if not _ccpa_ok: - _banners.append(mo.callout(mo.md( - f"**CCPA Violation.** Epsilon = **{_eps:.1f}** also exceeds CCPA threshold " - f"(ε ≤ {_CCPA_EPS_MAX}). California users' data is not adequately protected. " - f"This may trigger regulatory enforcement under CPRA 2023 provisions." - ), kind="danger")) - - if _banners: - mo.vstack([_formula] + _banners) - else: - _formula - return ( - _eps, - _strat, - _gdpr_ok, - _ccpa_ok, - _accuracy_penalty_pct, - _is_federated, - ) - - -# ─── ACT II: DECISION 5 — FAIRNESS ─────────────────────────────────────────── -@app.cell(hide_code=True) -def _(mo): - mo.md("### Decision 5 — Fairness") - return - - -@app.cell(hide_code=True) -def _(mo): - d5_fairness = mo.ui.dropdown( - options={ - "Equalized odds (equal FPR + FNR across groups)": "equalized_odds", - "Demographic parity (equal positive rate)": "dem_parity", - "Calibration (equal predicted probabilities)": "calibration", - "Individual fairness (similar people treated similarly)": "individual", - }, - value="Equalized odds (equal FPR + FNR across groups)", - label="Fairness criterion", - ) - d5_base_rate_gap = mo.ui.slider( - start=0, stop=40, value=15, step=1, - label="Base rate gap between highest and lowest country (%)", - show_value=True, - ) - d5_accuracy = mo.ui.slider( - start=50, stop=99, value=85, step=1, - label="Overall model accuracy (%)", + flexible_job_pct = mo.ui.slider( + start=0, stop=50, value=20, step=5, + label="Flexible / deferrable job percentage (% of workload shifted to low-carbon hours)", show_value=True, ) mo.vstack([ - d5_fairness, - mo.hstack([d5_base_rate_gap, d5_accuracy], justify="start", gap=4), + mo.hstack([model_size_b, dp_epsilon, adv_train_weight], justify="start", gap=4), + parallelism_strategy, + mo.hstack([checkpoint_interval_min, flexible_job_pct], justify="start", gap=4), ]) - return (d5_fairness, d5_base_rate_gap, d5_accuracy) - - -@app.cell(hide_code=True) -def _( - COLORS, - d5_accuracy, - d5_base_rate_gap, - d5_fairness, - mo, - math, -): - _criterion = d5_fairness.value - _base_rate_gap = d5_base_rate_gap.value / 100.0 # convert to fraction - _acc = d5_accuracy.value / 100.0 - - # ── Chouldechova impossibility ───────────────────────────────────────────── - # When base rates differ across groups, cannot simultaneously satisfy: - # (1) equalized odds (equal FPR + FNR) - # (2) calibration (PPV equal across groups) - # Source: @sec-responsible-ai-chouldechova - # Minimum gap if we enforce equalized odds given base rate difference: - # PPV_1 / PPV_2 >= (1 - base_rate_2) / (1 - base_rate_1) × base_rate_1/base_rate_2 - # Approximate: forced accuracy loss when equalizing FPR across groups - # with base rate gap delta: loss ≈ delta × (1 - acc) / base_rate_midpoint - _base_rate_low = 0.20 # lowest-prevalence group - _base_rate_high = _base_rate_low + _base_rate_gap - - # Chouldechova: if we force equalized odds → calibration must be unequal - # Calibration gap ≈ base_rate_gap / (avg_base_rate × (1 + base_rate_gap)) - _avg_base_rate = (_base_rate_low + _base_rate_high) / 2.0 - if _avg_base_rate > 0 and _avg_base_rate < 1: - _calibration_gap = _base_rate_gap / (_avg_base_rate * (1.0 + _base_rate_gap)) - else: - _calibration_gap = 0.0 - _calibration_gap_pct = _calibration_gap * 100.0 - - # Equalized odds gap if we enforce calibration: - # FPR gap ≈ base_rate_gap × (1 - acc) / avg_base_rate - _fpr_gap_pct = _base_rate_gap * (1.0 - _acc) / max(_avg_base_rate, 0.01) * 100.0 - - # EU AI Act Art. 10: equalized odds gap ≤ 10% for high-risk systems - # Source: @sec-responsible-ai-eu-ai-act - _EU_AIACT_MAX_GAP = 0.10 # 10% maximum FPR/FNR gap - _eu_ok = _fpr_gap_pct / 100.0 <= _EU_AIACT_MAX_GAP - - # ── Accuracy penalty from fairness constraint ────────────────────────────── - # Enforcing equalized odds costs accuracy proportional to base rate gap - # Conservative estimate: 1-3% accuracy for every 5% base rate gap - _fairness_acc_penalty_pct = _base_rate_gap * 100.0 * 0.20 # 0.2 pp per 1pp gap - - # ── Color coding ────────────────────────────────────────────────────────── - _eu_color = COLORS["GreenLine"] if _eu_ok else COLORS["RedLine"] - _calib_color = ( - COLORS["GreenLine"] if _calibration_gap_pct < 10 else - COLORS["OrangeLine"] if _calibration_gap_pct < 20 else - COLORS["RedLine"] + return ( + model_size_b, + dp_epsilon, + adv_train_weight, + parallelism_strategy, + checkpoint_interval_min, + flexible_job_pct, ) - # Is Chouldechova active? — when base rate gap > 0 and using equalized odds - _chouldechova_active = _base_rate_gap > 0.01 and _criterion == "equalized_odds" - _formula = mo.Html(f""" -
-
- Fairness Physics — Chouldechova Impossibility -
-
-
Base rate range: {_base_rate_low*100:.0f}% (lowest) → - {_base_rate_high*100:.0f}% (highest); - gap = {_base_rate_gap*100:.0f}% - across 193 jurisdictions +# ─── ACT II: CONSTRAINT COMPUTATION ─────────────────────────────────────────── +@app.cell(hide_code=True) +def _( + ACCURACY_TARGET, + ADV_ROBUSTNESS_TARGET, + BASELINE_CI_G_KWH, + BUDGET_GPUS, + CARBON_REDUCTION_TARGET, + CHECKPOINT_COST_S, + COLORS, + DP_EPS_LIMIT, + GPUS_PER_NODE, + H100_BW_GBS, + H100_RAM_GB, + H100_TDP_W, + H100_TFLOPS_FP16, + HOSPITAL_COUNT, + INF_PER_DAY, + MTBF_GPU_HOURS, + P99_SLO_MS, + RENEW_CI_G_KWH, + UPTIME_TARGET, + adv_train_weight, + checkpoint_interval_min, + context_toggle, + dp_epsilon, + flexible_job_pct, + math, + mo, + model_size_b, + parallelism_strategy, +): + _ctx = context_toggle.value + _M_B = model_size_b.value # billions of parameters + _M = _M_B * 1e9 # raw parameter count + _eps = dp_epsilon.value + _adv_w = adv_train_weight.value + _strategy = parallelism_strategy.value + _T_min = checkpoint_interval_min.value + _flex_pct = flexible_job_pct.value / 100.0 + _ci = RENEW_CI_G_KWH if _ctx == "renewable" else BASELINE_CI_G_KWH + + # ───────────────────────────────────────────────────────────────────────── + # 1. ACCURACY + # Base accuracy model: larger models are more accurate (diminishing returns). + # Reference: scaling laws @sec-training-scaling-laws + # Approximation: accuracy ≈ 0.82 + 0.13 × (1 - exp(-M_B / 20)) + # DP noise penalty: ~k / eps, where k = 0.05 (5% at eps=1) + # Adversarial training penalty: clean accuracy drops with adv weight + # Reference: @sec-robust-ai-adversarial-training-tradeoff + # ───────────────────────────────────────────────────────────────────────── + _base_acc = 0.82 + 0.13 * (1.0 - math.exp(-_M_B / 20.0)) + _dp_acc_penalty = min(0.05 / _eps, 0.20) # ≤ 20% cap + _adv_acc_penalty = _adv_w * 0.08 # up to 8% clean accuracy cost + _accuracy = max(_base_acc - _dp_acc_penalty - _adv_acc_penalty, 0.0) + _accuracy_met = _accuracy >= ACCURACY_TARGET + + # ───────────────────────────────────────────────────────────────────────── + # 2. P99 LATENCY (Little's Law + Roofline decode model) + # Total daily requests = HOSPITAL_COUNT × INF_PER_DAY + # Assume uniform distribution → arrival rate λ (req/s) + # Decode: 1 token/step, arithmetic intensity = 1 op/byte → BW-bound + # Latency per token ≈ bytes_per_token / BW_GBs + # Model bytes (FP16 inference): M_params × 2 bytes + # Sequence response: assume 50 tokens average + # P99 ≈ 3× avg for M/M/1 at moderate utilization + # Reference: @sec-model-serving-littles-law, @sec-inference-roofline-decode + # ───────────────────────────────────────────────────────────────────────── + _total_rps = HOSPITAL_COUNT * INF_PER_DAY / 86400.0 # req/s + _bytes_per_param = 2.0 # FP16 + _model_bytes_gb = _M * _bytes_per_param / 1e9 # GB + # Shards per replica: ceil(model_bytes / H100_RAM) + _shards = max(math.ceil(_model_bytes_gb / H100_RAM_GB), 1) + # Token decode rate per H100 (BW-bound) + _tokens_per_sec_gpu = H100_BW_GBS * 1e9 / (_M * _bytes_per_param) + _response_tokens = 50 # tokens per response + _avg_latency_s = _response_tokens / max(_tokens_per_sec_gpu, 1e-6) + _p99_latency_ms = _avg_latency_s * 1000 * 3.0 + _latency_met = _p99_latency_ms < P99_SLO_MS + + # Replicas needed to handle total_rps at avg_latency + _replicas_needed = math.ceil(_total_rps * _avg_latency_s * _shards) + _replicas_available = BUDGET_GPUS // _shards + + # ───────────────────────────────────────────────────────────────────────── + # 3. DP COMPLIANCE (HIPAA) + # eps ≤ 1.0 required for HIPAA-grade differential privacy + # Reference: @sec-security-privacy-dp-hipaa + # ───────────────────────────────────────────────────────────────────────── + _dp_met = _eps <= DP_EPS_LIMIT + + # ───────────────────────────────────────────────────────────────────────── + # 4. ADVERSARIAL ROBUSTNESS + # Adversarial robustness under PGD attack scales with adv_train_weight. + # At adv_w = 0: robustness ≈ 5% (near-zero for undefended model) + # At adv_w = 0.5: robustness ≈ 50% + # At adv_w = 1.0: robustness ≈ 70% + # Linear interpolation + saturation + # Reference: @sec-robust-ai-pgd-training + # ───────────────────────────────────────────────────────────────────────── + _adv_robustness = 0.05 + _adv_w * 0.65 + _adversarial_met = _adv_robustness >= ADV_ROBUSTNESS_TARGET + + # ───────────────────────────────────────────────────────────────────────── + # 5. CARBON REDUCTION + # Baseline: BASELINE_CI_G_KWH (386 g CO2/kWh) + # Actual carbon intensity depends on context toggle + flexible scheduling + # Carbon-aware scheduling shifts flex_pct of workload to low-CI hours + # Effective CI = ci × (1 - flex_pct × 0.7) [scheduling reduces CI by up to 70%] + # Target: ≥ 40% reduction vs. BASELINE_CI_G_KWH + # Reference: @sec-sustainable-ai-carbon-aware-scheduling + # ───────────────────────────────────────────────────────────────────────── + _eff_ci = _ci * (1.0 - _flex_pct * 0.70) + _carbon_reduction = 1.0 - _eff_ci / BASELINE_CI_G_KWH + _carbon_met = _carbon_reduction >= CARBON_REDUCTION_TARGET + + # ───────────────────────────────────────────────────────────────────────── + # 6. FAULT TOLERANCE (Young-Daly + availability) + # Total GPUs in serving fleet + # Cluster-level failure rate: lambda = N_gpus / MTBF_GPU_HOURS + # Young-Daly: T* = sqrt(2 × C / lambda) where C = CHECKPOINT_COST_S / 3600 + # Availability = 1 - lambda × (T_min/2 + C + restart) / 1 + # Target: ≥ 99.9% uptime + # Reference: @sec-fault-tolerance-young-daly + # ───────────────────────────────────────────────────────────────────────── + _N_gpus = min(BUDGET_GPUS, max(_replicas_needed, 100)) + _lambda_hr = _N_gpus / MTBF_GPU_HOURS # failures/hour + _C_hr = CHECKPOINT_COST_S / 3600.0 # checkpoint cost in hours + _T_hr = _T_min / 60.0 + _T_opt_min = math.sqrt(2.0 * _C_hr / max(_lambda_hr, 1e-9)) * 60.0 + _restart_hr = 0.5 # 30-minute restart + _downtime_frac = _lambda_hr * (_T_hr / 2.0 + _C_hr + _restart_hr) + _uptime_pct = max(1.0 - _downtime_frac, 0.0) + _fault_tol_met = _uptime_pct >= UPTIME_TARGET + + # ───────────────────────────────────────────────────────────────────────── + # BUDGET CHECK: total GPUs needed vs. available + # ───────────────────────────────────────────────────────────────────────── + _budget_ok = _replicas_needed <= BUDGET_GPUS + + # ───────────────────────────────────────────────────────────────────────── + # OVERALL + # ───────────────────────────────────────────────────────────────────────── + _constraints_all_met = ( + _accuracy_met and _latency_met and _dp_met and + _adversarial_met and _carbon_met and _fault_tol_met + ) + _n_met = sum([ + _accuracy_met, _latency_met, _dp_met, + _adversarial_met, _carbon_met, _fault_tol_met + ]) + + # ── Color helper ────────────────────────────────────────────────────────── + def _sc(ok): + return COLORS["GreenLine"] if ok else COLORS["RedLine"] + + def _tick(ok): + return "✓" if ok else "❌" + + def _badge(ok): + return "PASS" if ok else "FAIL" + + # ── 6-constraint scorecard ──────────────────────────────────────────────── + _scorecard = mo.Html(f""" +
+ +
+
+ 1. Accuracy
-
Criterion: {_criterion}
-
If equalized odds is enforced → calibration gap - ≈ - {_calibration_gap_pct:.1f}% - (Chouldechova constraint, @sec-responsible-ai-chouldechova) +
+ {_accuracy*100:.1f}%
-
If calibration is enforced → FPR/FNR gap - ≈ - {_fpr_gap_pct:.1f}% -  {'✓ ≤ 10% EU AI Act' if _eu_ok else '❌ >10% EU AI Act Art. 10'} +
+ target: ≥{ACCURACY_TARGET*100:.0f}% +  {_tick(_accuracy_met)} {_badge(_accuracy_met)}
-
Fairness constraint accuracy cost - ≈ ~{_fairness_acc_penalty_pct:.1f}% - degradation +
+ base={_base_acc*100:.1f}% − DP penalty={_dp_acc_penalty*100:.1f}% + − adv penalty={_adv_acc_penalty*100:.1f}%
+ +
+
+ 2. P99 Latency +
+
+ {_p99_latency_ms:.0f}ms +
+
+ SLO: <{P99_SLO_MS}ms +  {_tick(_latency_met)} {_badge(_latency_met)} +
+
+ {_M_B}B params × 2 bytes = {_model_bytes_gb:.0f} GB +  → {_shards} shard(s)/replica +
+
+ +
+
+ 3. DP Compliance (HIPAA) +
+
+ ε = {_eps:.1f} +
+
+ HIPAA limit: ε ≤ {DP_EPS_LIMIT} +  {_tick(_dp_met)} {_badge(_dp_met)} +
+
+ accuracy penalty ≈ {_dp_acc_penalty*100:.1f}% +
+
+ +
+
+ 4. Adversarial Robustness +
+
+ {_adv_robustness*100:.0f}% +
+
+ PGD target: ≥{ADV_ROBUSTNESS_TARGET*100:.0f}% +  {_tick(_adversarial_met)} {_badge(_adversarial_met)} +
+
+ adv weight={_adv_w:.2f} +  → clean acc cost={_adv_acc_penalty*100:.1f}% +
+
+ +
+
+ 5. Carbon Reduction +
+
+ {_carbon_reduction*100:.0f}% +
+
+ target: >{CARBON_REDUCTION_TARGET*100:.0f}% vs. baseline +  {_tick(_carbon_met)} {_badge(_carbon_met)} +
+
+ eff CI = {_eff_ci:.0f} g/kWh +  (flex={_flex_pct*100:.0f}%) +
+
+ +
+
+ 6. Fault Tolerance +
+
+ {_uptime_pct*100:.3f}% +
+
+ uptime target: ≥{UPTIME_TARGET*100:.1f}% +  {_tick(_fault_tol_met)} {_badge(_fault_tol_met)} +
+
+ T*={_T_opt_min:.0f}min  |  your T={_T_min}min +
+
+
""") + _scorecard + return ( + _accuracy, + _accuracy_met, + _latency_met, + _dp_met, + _adversarial_met, + _carbon_met, + _fault_tol_met, + _constraints_all_met, + _n_met, + _budget_ok, + _uptime_pct, + _p99_latency_ms, + _adv_robustness, + _carbon_reduction, + _eff_ci, + _T_opt_min, + _shards, + _replicas_needed, + _dp_acc_penalty, + _adv_acc_penalty, + ) + +# ─── ACT II: FAILURE STATES AND SUCCESS STATE ───────────────────────────────── +@app.cell(hide_code=True) +def _( + ACCURACY_TARGET, + ADV_ROBUSTNESS_TARGET, + BUDGET_GPUS, + CARBON_REDUCTION_TARGET, + DP_EPS_LIMIT, + P99_SLO_MS, + UPTIME_TARGET, + _accuracy, + _accuracy_met, + _adv_robustness, + _adversarial_met, + _budget_ok, + _carbon_met, + _carbon_reduction, + _constraints_all_met, + _dp_met, + _eff_ci, + _fault_tol_met, + _latency_met, + _n_met, + _p99_latency_ms, + _replicas_needed, + _T_opt_min, + _uptime_pct, + checkpoint_interval_min, + dp_epsilon, + mo, +): _banners = [] - if not _eu_ok: + + if not _accuracy_met: _banners.append(mo.callout(mo.md( - f"**EU AI Act Violation.** Enforcing calibration with a {_base_rate_gap*100:.0f}% " - f"base rate gap across jurisdictions produces an FPR/FNR gap of " - f"**{_fpr_gap_pct:.1f}%** — exceeding the 10% threshold under EU AI Act Article 10. " - f"The Chouldechova impossibility theorem states this cannot be fixed by " - f"better training data alone: the incompatibility is mathematical, not empirical. " - f"Options: jurisdiction-specific models, or accept accuracy cost to enforce " - f"equalized odds at the expense of calibration." + f"**Accuracy below clinical threshold.** Current accuracy: " + f"**{_accuracy*100:.1f}%** (required: {ACCURACY_TARGET*100:.0f}%). " + f"DP noise (eps={dp_epsilon.value:.1f}) and adversarial training together " + f"impose accuracy penalties that compound. " + f"Increase model size OR reduce adversarial weight OR raise eps (if HIPAA allows). " + f"Note: DP and adversarial training pull accuracy in the SAME downward direction " + f"— both add noise/randomization that smooth decision boundaries." ), kind="danger")) - if _chouldechova_active and _base_rate_gap > 0.10: + if not _latency_met: _banners.append(mo.callout(mo.md( - f"**Chouldechova Theorem Active.** With a {_base_rate_gap*100:.0f}% base rate gap " - f"and equalized odds enforcement, calibration error will be approximately " - f"**{_calibration_gap_pct:.1f}%**. This is not a model quality problem — " - f"it is a mathematical constraint from @sec-responsible-ai-chouldechova. " - f"The only architectural solutions are: per-jurisdiction models, " - f"rejection of the equalized odds criterion in high-gap jurisdictions, " - f"or explicit transparency to regulators." + f"**P99 SLO violated.** Estimated P99 = **{_p99_latency_ms:.0f}ms** " + f"(SLO: {P99_SLO_MS}ms). " + f"The model's decode rate is bandwidth-bound (arithmetic intensity = 1 op/byte). " + f"Reduce model size to lower per-token latency, or add more replicas. " + f"You need {_replicas_needed:,} GPU-shards; budget is {BUDGET_GPUS:,}." + ), kind="danger")) + + if not _dp_met: + _banners.append(mo.callout(mo.md( + f"**HIPAA DP violation.** epsilon = **{dp_epsilon.value:.1f}** exceeds " + f"the HIPAA-grade limit of eps <= {DP_EPS_LIMIT}. " + f"Medical image data under HIPAA requires strong differential privacy. " + f"Reduce epsilon — at the cost of increased accuracy penalty." + ), kind="danger")) + + if not _adversarial_met: + _banners.append(mo.callout(mo.md( + f"**Adversarial robustness insufficient.** Current PGD robustness: " + f"**{_adv_robustness*100:.0f}%** (target: {ADV_ROBUSTNESS_TARGET*100:.0f}%). " + f"Medical AI systems in adversarial environments require adversarial training. " + f"Increase adversarial training weight — but note it reduces clean accuracy." + ), kind="danger")) + + if not _carbon_met: + _banners.append(mo.callout(mo.md( + f"**Carbon reduction target missed.** Achieved: " + f"**{_carbon_reduction*100:.0f}%** reduction " + f"(effective CI: {_eff_ci:.0f} g CO2/kWh). " + f"Target: {CARBON_REDUCTION_TARGET*100:.0f}% reduction vs. baseline. " + f"Switch to carbon-optimized context OR increase flexible job percentage. " + f"Jevons Paradox warning: efficiency gains alone may be insufficient " + f"if fleet scale grows faster than carbon intensity falls." + ), kind="danger")) + + if not _fault_tol_met: + _banners.append(mo.callout(mo.md( + f"**Uptime target missed.** Estimated uptime: " + f"**{_uptime_pct*100:.3f}%** (target: {UPTIME_TARGET*100:.1f}%). " + f"Young-Daly optimal checkpoint interval is **{_T_opt_min:.0f} min** " + f"for this fleet size. Your interval: {checkpoint_interval_min.value} min. " + f"Reduce checkpoint interval toward T* to minimize expected waste time." + ), kind="danger")) + + if not _budget_ok: + _banners.append(mo.callout(mo.md( + f"**GPU budget exceeded.** Your configuration requires " + f"**{_replicas_needed:,} GPU-shards** but the budget is {BUDGET_GPUS:,} H100s. " + f"Reduce model size, increase quantization (which increases shards-per-replica " + f"at lower memory), or accept lower replica count with higher latency." ), kind="warn")) - if _banners: - mo.vstack([_formula] + _banners) + if _constraints_all_met: + mo.callout(mo.md( + f"**ARCHITECTURE APPROVED: All {_n_met}/6 constraints satisfied. " + f"System is deployable.** " + f"Accuracy: {_accuracy*100:.1f}% | P99: {_p99_latency_ms:.0f}ms | " + f"DP eps: {dp_epsilon.value:.1f} | Robustness: {_adv_robustness*100:.0f}% | " + f"Carbon reduction: {_carbon_reduction*100:.0f}% | " + f"Uptime: {_uptime_pct*100:.3f}%" + ), kind="success") + elif _banners: + mo.vstack(_banners) else: - _formula - return ( - _criterion, - _base_rate_gap, - _eu_ok, - _fpr_gap_pct, - _calibration_gap_pct, - _fairness_acc_penalty_pct, - _chouldechova_active, - ) - - -# ─── ACT II: CARBON CONSTRAINT ──────────────────────────────────────────────── -@app.cell(hide_code=True) -def _(mo): - mo.md("### Carbon Constraint — 2027 Carbon-Neutral Commitment") + mo.callout(mo.md( + f"**{_n_met}/6 constraints met.** " + f"Adjust the sliders above to satisfy all constraints simultaneously." + ), kind="info") return -@app.cell(hide_code=True) -def _( - CARBON_THRESHOLD_G_KWH, - COLORS, - EU_GRID_CARBON_G_KWH, - H100_TDP_W, - RENEW_CARBON_G_KWH, - _cluster_power_mw, - _N, - _serving_cost_yr_b, - mo, -): - # ── Total system power ───────────────────────────────────────────────────── - # Training cluster + serving fleet + overhead (PUE 1.2×) - # Source: @sec-sustainable-ai-data-center-pue - _PUE = 1.2 # Power Usage Effectiveness; industry average - _serving_gpus_est = int(_serving_cost_yr_b * 1e9 / (3.50 * 8760)) # rough estimate - - _total_gpus = int(_N) + _serving_gpus_est - _raw_power_mw = _total_gpus * H100_TDP_W / 1e6 # MW - _total_power_mw = _raw_power_mw * _PUE - - # ── Annual energy ───────────────────────────────────────────────────────── - _annual_energy_gwh = _total_power_mw * 8760 / 1000 # GWh/year - - # ── Carbon emissions ────────────────────────────────────────────────────── - # Assume 3 data center regions: EU, US-CA, US-East - # Mix: 40% renewable PPA, 60% grid average - _renew_fraction = 0.4 - _eff_carbon = ( - _renew_fraction * RENEW_CARBON_G_KWH + - (1 - _renew_fraction) * EU_GRID_CARBON_G_KWH - ) - _carbon_ok = _eff_carbon <= CARBON_THRESHOLD_G_KWH - - _annual_co2_kt = _annual_energy_gwh * _eff_carbon / 1e6 * 1e9 / 1e6 # kilotonnes CO2 - - # Carbon-neutral path: 100% renewable PPA - _eff_carbon_100renew = RENEW_CARBON_G_KWH - _co2_100renew_kt = _annual_energy_gwh * _eff_carbon_100renew / 1e6 * 1e9 / 1e6 - - # ── Renewable PPA cost premium ───────────────────────────────────────────── - # Renewable PPA ~$50/MWh vs grid ~$40/MWh → ~25% premium - # Source: @sec-sustainable-ai-carbon-aware-scheduling - _ppa_premium_usd_m = _annual_energy_gwh * 1000 * 10.0 / 1e6 # $10/MWh delta × GWh - - _carbon_color = COLORS["GreenLine"] if _carbon_ok else COLORS["RedLine"] - _eff_color = COLORS["GreenLine"] if _eff_carbon <= CARBON_THRESHOLD_G_KWH else ( - COLORS["OrangeLine"] if _eff_carbon <= 150 else COLORS["RedLine"] - ) - - _formula = mo.Html(f""" -
-
- Carbon Physics — Jevons Paradox + Grid Carbon Intensity -
-
-
Total GPUs (train + serve) ≈ {_total_gpus:,}
-
Raw GPU power = {_total_gpus:,} × {H100_TDP_W}W - = {_raw_power_mw:.0f} MW -
-
Total facility power (PUE {_PUE}) - = {_total_power_mw:.0f} MW -
-
Annual energy = {_total_power_mw:.0f}MW × 8,760 hr - = {_annual_energy_gwh:.0f} GWh/yr -
-
Effective carbon intensity ({int(_renew_fraction*100)}% renewable PPA) - = - {_eff_carbon:.0f} g CO&sub2;/kWh -  (threshold: {CARBON_THRESHOLD_G_KWH} g/kWh) -  {'✓ carbon-neutral' if _carbon_ok else '❌ above threshold'} -
-
Annual CO&sub2; emissions - ≈ {_annual_co2_kt:.0f} kt CO&sub2; -
-
100% renewable path: {_co2_100renew_kt:.0f} kt CO&sub2; - | PPA premium: +${_ppa_premium_usd_m:.0f}M/yr -
-
-
- """) - - _banners = [] - if not _carbon_ok: - _banners.append(mo.callout(mo.md( - f"**Carbon Target Missed.** With {int(_renew_fraction*100)}% renewable PPA, " - f"effective carbon intensity is **{_eff_carbon:.0f} g CO2/kWh**, " - f"exceeding the carbon-neutral threshold of {CARBON_THRESHOLD_G_KWH} g/kWh. " - f"Your {_total_power_mw:.0f} MW fleet emits ~{_annual_co2_kt:.0f} kt CO2/year. " - f"To reach carbon-neutral by 2027, increase renewable PPA to ≥90% or " - f"relocate training workloads to zero-carbon regions (Iceland, Norway, Quebec). " - f"Note the Jevons Paradox (@sec-sustainable-ai-jevons): " - f"efficiency improvements alone cannot reach this target if fleet size grows." - ), kind="danger")) - - if _banners: - mo.vstack([_formula] + _banners) - else: - _formula - return ( - _carbon_ok, - _total_power_mw, - _annual_energy_gwh, - _eff_carbon, - _annual_co2_kt, - _total_gpus, - ) - - -# ─── ACT II: SYSTEM FEASIBILITY VERDICT ────────────────────────────────────── -@app.cell(hide_code=True) -def _(mo): - mo.md("### System Feasibility Verdict") - return - - -@app.cell(hide_code=True) -def _( - COLORS, - _accuracy_penalty_pct, - _annual_co2_kt, - _avail_ok, - _carbon_ok, - _eu_ok, - _fairness_acc_penalty_pct, - _gdpr_ok, - _oom, - _overhead_ok, - _p99_ms, - _serving_cost_yr_b, - _slo_ok, - _total_gpus, - _train_cost_m, - go, - apply_plotly_theme, - mo, -): - # ── Aggregate system validity ────────────────────────────────────────────── - _constraints = { - "Training: No OOM": not _oom, - "Checkpoint: Overhead OK": _overhead_ok, - "Serving: P99 < 500ms": _slo_ok, - "Reliability: 99.99% avail": _avail_ok, - "Privacy: GDPR ε ≤ 1": _gdpr_ok, - "Fairness: EU AI Act ≤ 10%": _eu_ok, - "Carbon: Neutral by 2027": _carbon_ok, - } - - _total_pass = sum(_constraints.values()) - _total_checks = len(_constraints) - _system_valid = _total_pass == _total_checks - - # ── Total cost estimate ──────────────────────────────────────────────────── - _total_cost_b = (_train_cost_m * 12 / 1000) + _serving_cost_yr_b # billion $/year - - # ── Constraint bar chart ─────────────────────────────────────────────────── - _labels = list(_constraints.keys()) - _pass = [1 if v else 0 for v in _constraints.values()] - _colors_bar = [ - COLORS["GreenLine"] if v else COLORS["RedLine"] - for v in _constraints.values() - ] - - _fig = go.Figure(go.Bar( - x=_labels, - y=_pass, - marker_color=_colors_bar, - text=["PASS" if v else "FAIL" for v in _constraints.values()], - textposition="outside", - textfont=dict(size=11, color=COLORS["TextSec"]), - )) - _fig.update_layout( - height=280, - xaxis=dict(tickangle=-20, tickfont=dict(size=10, color=COLORS["TextSec"])), - yaxis=dict(visible=False, range=[0, 1.4]), - margin=dict(t=40, b=80, l=20, r=20), - title=dict( - text=f"System Constraint Audit — {_total_pass}/{_total_checks} Passed", - font=dict(size=13, color=COLORS["Text"]), - x=0.5, - ), - ) - apply_plotly_theme(_fig) - - # ── Summary card ────────────────────────────────────────────────────────── - _verdict_color = COLORS["GreenLine"] if _system_valid else COLORS["RedLine"] - _verdict_label = "FEASIBLE" if _system_valid else "INFEASIBLE" - - _summary = mo.Html(f""" -
-
-
- System Verdict -
-
- {_verdict_label} -
-
- {_total_pass}/{_total_checks} constraints -
-
-
-
- P99 Latency -
-
- {_p99_ms:.0f}ms -
-
SLO: < 500ms
-
-
-
- Annual Cost -
-
- ${_total_cost_b:.1f}B -
-
- budget: $10B total -
-
-
-
- Annual CO2 -
-
- {_annual_co2_kt:.0f}kt -
-
- {'carbon-neutral' if _carbon_ok else 'above target'} -
-
-
- """) - - mo.vstack([_summary, mo.ui.plotly(_fig)]) - return ( - _constraints, - _system_valid, - _total_pass, - _total_checks, - _total_cost_b, - _verdict_label, - ) - - # ─── ACT II: PREDICTION REVEAL ──────────────────────────────────────────────── @app.cell(hide_code=True) def _( - COLORS, - _constraints, - _system_valid, - _total_pass, - _total_checks, + _accuracy, + _accuracy_met, + _adversarial_met, + _constraints_all_met, + _dp_met, + _latency_met, + _n_met, act2_pred, mo, ): - _constraint_names_failing = [k for k, v in _constraints.items() if not v] + _failing = [] + if not _accuracy_met: _failing.append("Accuracy") + if not _latency_met: _failing.append("P99 Latency") + if not _dp_met: _failing.append("DP Compliance") + if not _adversarial_met: _failing.append("Adversarial Robustness") _feedback_map = { "A": ( - "**Training constraint.** " - "Memory is real: a 1T parameter model in full training state requires 20 TB. " - "Without 3D parallelism, no H100 (80 GB) cluster of any size can hold it in " - "per-GPU memory. The OOM constraint is structural, not solvable by adding GPUs " - "without also changing the sharding strategy. However, with 3D parallelism " - "and sufficient sharding, the training constraint *can* be satisfied — it is " - "not the binding limit at reasonable cluster sizes." + "**DP epsilon is genuinely difficult** — at eps=1, accuracy degrades by ~5%. " + "For a baseline 95% target model, this leaves no margin for other accuracy " + "costs. But this is only correct in a narrow sense. The deeper issue is that " + "DP noise and adversarial training *both* degrade accuracy in the same direction: " + "both smooth decision boundaries. These two constraints are " + "**fundamentally incompatible**, not just difficult to balance simultaneously. " + "DP adds noise to make the model's outputs less sensitive to any individual " + "training sample. Adversarial training adds noise to make the model robust " + "to input perturbations. Both mechanisms reduce model confidence — but for " + "orthogonal reasons. This is the mathematical conflict at the heart of Act II." ), "B": ( - "**Serving constraint.** " - "P99 < 500ms at 5B users is achievable — but only by combining cloud, edge, " - "and mobile tiers. Serving all requests at cloud P99 would require enormous " - "replica counts to keep utilization below the tail-latency cliff. " - "Edge and mobile offloading (quantized models at INT4/INT2) are the architectural " - "levers that make the P99 SLO achievable within budget. This is the design " - "insight: serving is tractable *if* you use the full tier hierarchy." + "**Correct.** No single architecture satisfies all six constraints without " + "explicit tradeoff negotiation. The key conflicts are: " + "(1) DP and adversarial robustness both reduce accuracy — they cannot both " + "be maximized without a model large enough to absorb both penalties; " + "(2) large models reduce P99 latency; (3) carbon reduction conflicts with " + "fleet scale. The feasible region (all-green) requires navigating the " + "intersection of these constraints — which is exactly what the Architecture " + "Synthesizer reveals. This is the Chouldechova-generalized lesson: " + "in multi-constraint systems, you choose which constraint to relax." ), "C": ( - "**Privacy constraint.** " - "GDPR-grade DP (ε ≤ 1) does impose accuracy degradation — approximately 5% at ε=1. " - "This is significant but not fatal. The binding privacy constraint is not " - "accuracy degradation but *data residency*: EU data cannot be used to train " - "a centralized model without GDPR compliance, so federated learning with " - "LoRA adapter updates is architecturally required regardless of ε. " - "Privacy is a hard structural constraint, not just an accuracy tax." + "**Partially correct, but incomplete.** The 10,000 H100 budget *can* " + "accommodate the serving load at smaller model sizes. But budget sufficiency " + "does not equal constraint satisfaction. Even with 10,000 H100s, " + "a 70B model at P99 < 200ms requires more shards-per-replica than available, " + "and DP + adversarial training may push accuracy below 95%. " + "Hardware budget is necessary but not sufficient." ), "D": ( - "**All constraints satisfiable.** " - "You are correct in spirit: with the right architectural choices, all constraints " - "can be satisfied simultaneously. But 'simultaneously' is doing a lot of work. " - "Each satisfying configuration requires a specific combination: 3D parallelism " - "for training, tier-aware serving for P99, Young-Daly optimal checkpointing, " - "GDPR-compliant federated strategy for EU, per-jurisdiction fairness models, " - "and ≥90% renewable PPA for carbon. The constraints are navigable — but they " - "are not independent. Every architectural choice propagates to multiple constraints. " - "That is the meta-principle." + "**Incorrect.** Carbon reduction is NOT independent of other constraints. " + "The Jevons Paradox directly links carbon to fleet scale: if you add GPUs " + "to satisfy the latency SLO, you increase total power consumption, " + "which makes the carbon target harder to hit. Carbon-aware scheduling " + "reduces effective CI, but only if deferrable jobs exist to shift. " + "Carbon is entangled with every other dimension through fleet size." ), } - _chosen = _feedback_map.get(act2_pred.value, _feedback_map["D"]) + _chosen = _feedback_map.get(act2_pred.value, _feedback_map["B"]) - if _system_valid: - _status = mo.callout(mo.md( - f"**Feasible architecture found.** {_total_pass}/{_total_checks} constraints pass. " - + _chosen + if _constraints_all_met: + mo.callout(mo.md( + f"**{_n_met}/6 constraints satisfied.** " + _chosen ), kind="success") else: - _fail_list = ", ".join(_constraint_names_failing) - _status = mo.callout(mo.md( - f"**Architecture infeasible.** {_total_pass}/{_total_checks} constraints pass. " - f"Failing: **{_fail_list}**. " + _chosen + _fail_str = ", ".join(_failing) if _failing else "multiple" + mo.callout(mo.md( + f"**{_n_met}/6 constraints satisfied.** " + f"Currently failing: **{_fail_str}**. " + + _chosen ), kind="warn") - - _status return -# ─── ACT II: MATH PEEK ──────────────────────────────────────────────────────── +# ─── ACT II: REFLECTION ─────────────────────────────────────────────────────── +@app.cell(hide_code=True) +def _(mo): + mo.md("### Reflection") + return + + +@app.cell(hide_code=True) +def _(mo): + act2_reflection = mo.ui.radio( + options={ + "A) P99 latency and model accuracy — larger models are slower": "A", + "B) DP privacy and adversarial robustness — both require noise/randomization but in opposite directions for model confidence": "B", + "C) Carbon reduction and fault tolerance — checkpointing uses more energy": "C", + "D) Parallelism efficiency and checkpoint overhead — communication vs. recovery cost": "D", + }, + label="Which two constraints are FUNDAMENTALLY incompatible (not just hard to balance simultaneously)?", + ) + act2_reflection + return (act2_reflection,) + + +@app.cell(hide_code=True) +def _(act2_reflection, mo): + if act2_reflection.value is None: + mo.callout( + mo.md("Select your reflection answer above to continue."), + kind="warn", + ) + elif act2_reflection.value == "B": + mo.callout(mo.md( + "**Correct.** DP privacy and adversarial robustness are fundamentally " + "incompatible in the following sense: " + "**DP noise makes the model's outputs smoother and less sensitive** " + "to individual inputs (including adversarial perturbations). " + "**Adversarial training sharpens the model's decision boundaries** " + "to resist those same perturbations. " + "These two mechanisms push model confidence in opposite directions. " + "DP adds isotropic Gaussian noise to gradients during training, which " + "diffuses the loss landscape. Adversarial training concentrates the " + "loss signal at adversarial examples, sharpening it. " + "The result: achieving strong DP (low eps) while simultaneously achieving " + "high adversarial robustness requires a model with enough capacity to " + "maintain both — but both penalize clean accuracy. " + "This is not an engineering challenge. It is an algebraic tension, " + "analogous to Chouldechova's impossibility in the fairness domain." + ), kind="success") + elif act2_reflection.value == "A": + mo.callout(mo.md( + "**This is a tradeoff, not a fundamental incompatibility.** " + "Larger models are slower — true. But you can add replicas, use " + "quantization, or select a smaller model that still achieves 95% accuracy. " + "Latency and accuracy can both be satisfied with the right design. " + "There is no mathematical theorem preventing their simultaneous satisfaction. " + "DP and adversarial robustness, by contrast, have mechanistic interference." + ), kind="warn") + elif act2_reflection.value == "C": + mo.callout(mo.md( + "**This is not fundamentally incompatible.** " + "Carbon-aware scheduling and checkpoint frequency operate on different " + "timescales and resource dimensions. You can checkpoint frequently " + "without increasing power consumption (checkpoints are I/O-bound, " + "not compute-bound). Fault tolerance and carbon are independently satisfiable " + "with the right architectural choices. They are not mechanistically coupled." + ), kind="warn") + else: + mo.callout(mo.md( + "**This is a tradeoff, not a fundamental incompatibility.** " + "Communication overhead and checkpoint cost can be jointly minimized " + "with asynchronous checkpointing and topology-aware AllReduce. " + "They compete for network bandwidth but do not violate any theorem. " + "The right system design reduces both independently." + ), kind="warn") + return + + +# ─── ACT II: MATHPEEK ACCORDION ─────────────────────────────────────────────── @app.cell(hide_code=True) def _(mo): mo.accordion({ - "The five governing equations for Act II": mo.md(""" - **Young-Daly (Fault Tolerance):** - `T* = sqrt(2 × C / λ)` - where C = checkpoint write time, λ = cluster failure rate = N_GPUs / MTBF_per_GPU - — Source: @sec-fault-tolerance-young-daly + "The fundamental conflict: DP noise vs. adversarial robustness": mo.md(""" + **Differential Privacy (DP Noise Direction):** + During training, DP-SGD clips gradients to sensitivity S, then adds noise: + `g_tilde = clip(g, S) + N(0, sigma^2 * S^2 * I)` + Effect: gradients from all training examples (including adversarial ones) + are *smoothed*. The learned decision boundary becomes flatter near training points. + — Source: @sec-security-privacy-dp-sgd - **Little's Law (Serving):** - `N_concurrent = λ_arrival × W_latency` - At saturation: `Throughput_max = N_replicas / (latency × n_shards)` - P99 ≈ 3× mean for M/M/1 queues at moderate utilization - — Source: @sec-model-serving-littles-law + **Adversarial Training (Robustness Direction):** + At each step, adversarial training maximizes the loss over a perturbation ball: + `theta* = argmin_theta E[max_{delta: ||delta|| <= eps} L(x + delta, y; theta)]` + Effect: the decision boundary is forced *sharp* at adversarial perturbations. + The model must distinguish clean from perturbed inputs with high confidence. + — Source: @sec-robust-ai-pgd - **Roofline (Inference Latency):** - `Tokens/sec = min(TFLOPS × MFU, BW_GBs / bytes_per_param / params)` - At batch=1 (autoregressive decode), arithmetic intensity = 1 op/byte → bandwidth-bound - — Source: @sec-hw-acceleration-roofline + **The Tension:** + DP smooths → lower confidence near any input. + Adversarial training sharpens → higher confidence near adversarial inputs. + Both *penalize clean accuracy* for different reasons. + At low DP epsilon (strong privacy), the noise scale sigma is large, + and the gradients from adversarial examples are effectively washed out — + the adversarial training signal is attenuated by DP noise. + This is not fixable by adding more data or a larger model: + it is a consequence of the conflicting objectives. - **Differential Privacy:** - `ε ≥ Δf / σ` where σ = noise scale, Δf = L2 sensitivity of query - GDPR-grade: ε ≤ 1.0; CCPA-grade: ε ≤ 3.0 - Accuracy penalty ≈ k/ε (monotone: stronger privacy = more noise = lower accuracy) - — Source: @sec-security-privacy-dp - - **Chouldechova Impossibility:** - When base rates differ between groups A and B (p_A ≠ p_B), any classifier - satisfying calibration AND equalized odds must have FPR_A ≠ FPR_B. - No ML improvement can resolve this — it is an algebraic identity. - — Source: @sec-responsible-ai-chouldechova + **The Resolution:** + The feasible region exists (all-green is achievable in this lab) + but requires: (1) a model large enough that both accuracy penalties still + leave you above 95%, (2) an epsilon in [0.5, 1.0] that satisfies HIPAA + while not destroying the adversarial training signal, and (3) an + adversarial weight calibrated to the DP noise level. """) }) return # ═══════════════════════════════════════════════════════════════════════════════ -# DESIGN LEDGER SAVE + HUD +# VOL1 + VOL2 SYNTHESIS TIMELINE +# ═══════════════════════════════════════════════════════════════════════════════ + + +@app.cell(hide_code=True) +def _(mo): + mo.md(""" + --- + ## Curriculum Journey Summary + """) + return + + +@app.cell(hide_code=True) +def _(COLORS, _ledger_map, mo): + # Build a visual timeline of all 33 labs with constraint_hit indicators + _vol1_entries = [ + (f"V1-{c:02d}", str(c)) for c in range(1, 17) + ] + _vol2_entries = [ + (f"V2-{c:02d}", f"v2_{c:02d}") for c in range(1, 18) + ] + _all_entries = _vol1_entries + _vol2_entries + + def _dot(label, key): + _d = _ledger_map.get(key, {}) + _done = key in _ledger_map + _hit = _d.get("constraint_hit", False) + if not _done: + _bg = "#e2e8f0" + _color = "#94a3b8" + _sym = "" + elif _hit: + _bg = COLORS["RedL"] + _color = COLORS["RedLine"] + _sym = "!" + else: + _bg = COLORS["GreenL"] + _color = COLORS["GreenLine"] + _sym = "✓" + return ( + f'
' + f'{label}' + f'{_sym}' + f'
' + ) + + _v1_dots = "".join(_dot(lbl, key) for lbl, key in _vol1_entries) + _v2_dots = "".join(_dot(lbl, key) for lbl, key in _vol2_entries) + + _total_done = sum(1 for _, k in _all_entries if k in _ledger_map) + _total_hit = sum( + 1 for _, k in _all_entries + if _ledger_map.get(k, {}).get("constraint_hit", False) + ) + + # Dominant context across all labs + _contexts = [ + _ledger_map[k].get("context", "") + for _, k in _all_entries if k in _ledger_map + ] + from collections import Counter as _Counter + _ctx_count = _Counter(_contexts) + _dom_ctx = _ctx_count.most_common(1)[0][0] if _ctx_count else "N/A" + + mo.Html(f""" +
+
+ Lab Journey — All 33 Labs (Vol I + Vol II) +
+
+ Volume I (Labs V1-01 through V1-16) +
+
+ {_v1_dots} +
+
+ Volume II (Labs V2-01 through V2-17) +
+
+ {_v2_dots} +
+
+
+ Labs completed: +  {_total_done}/33 +
+
+ Constraints triggered: +  {_total_hit} +
+
+ Dominant context: +  {_dom_ctx} +
+
+ Green = completed, no failure  |  + Red = constraint triggered  |  + Grey = not yet completed +
+
+
+ """) + return + + +@app.cell(hide_code=True) +def _(mo): + mo.md(""" + *You have completed the ML Systems curriculum. The physics doesn't change — the constraints + just shift with scale.* + """) + return + + +# ═══════════════════════════════════════════════════════════════════════════════ +# DESIGN LEDGER SAVE + HUD FOOTER # ═══════════════════════════════════════════════════════════════════════════════ @app.cell(hide_code=True) def _( COLORS, - _accuracy_penalty_pct, - _avail_ok, - _carbon_ok, - _eu_ok, - _gdpr_ok, - _is_federated, - _N, - _oom, - _overhead_ok, - _slo_ok, - _system_valid, - _total_cost_b, - _total_gpus, - _total_pass, - _total_checks, - _T_min, - _verdict_label, + _accuracy, + _accuracy_met, + _adversarial_met, + _adv_robustness, + _carbon_met, + _carbon_reduction, + _constraints_all_met, + _dp_met, + _fault_tol_met, + _latency_met, + _n_met, + _p99_latency_ms, + _uptime_pct, act1_pred, act2_pred, - d1_parallelism, - d4_epsilon, - d5_fairness, + adv_train_weight, + checkpoint_interval_min, + context_toggle, + dp_epsilon, + flexible_job_pct, ledger, + model_size_b, mo, + parallelism_strategy, ): - # ── Invariants applied in this lab ──────────────────────────────────────── - _invariants = [ - "Young-Daly Optimal Checkpoint", - "Amdahl Scale Ceiling", - "Roofline Bandwidth Bound", - "Little's Law Serving", - "Differential Privacy epsilon-delta", - "Chouldechova Impossibility", - "Jevons Carbon Paradox", - "Memory Wall / OOM", - ] - # ── Save to Design Ledger ───────────────────────────────────────────────── ledger.save( chapter="v2_17", design={ - "context": "full_fleet", - "cluster_gpus": int(_N), - "parallelism_strategy": d1_parallelism.value, - "checkpoint_interval_min": float(_T_min), - "dp_epsilon": float(d4_epsilon.value), - "fairness_criterion": d5_fairness.value, - "carbon_compliant": bool(_carbon_ok), - "p99_slo_met": bool(_slo_ok), - "total_system_cost_b": float(_total_cost_b), + "context": context_toggle.value, + "model_size_b": float(model_size_b.value), + "dp_epsilon": float(dp_epsilon.value), + "adv_train_weight": float(adv_train_weight.value), + "parallelism_strategy": parallelism_strategy.value, + "checkpoint_interval_min": int(checkpoint_interval_min.value), + "flexible_job_pct": float(flexible_job_pct.value), + "constraints_all_met": bool(_constraints_all_met), + "accuracy_met": bool(_accuracy_met), + "latency_met": bool(_latency_met), + "dp_met": bool(_dp_met), + "adversarial_met": bool(_adversarial_met), + "carbon_met": bool(_carbon_met), + "fault_tolerance_met": bool(_fault_tol_met), "act1_prediction": str(act1_pred.value), - "act1_correct": False, # no single correct answer in Act I - "act2_result": "feasible" if _system_valid else "infeasible", - "act2_decision": d1_parallelism.value, - "constraint_hit": not _system_valid, - "system_valid": bool(_system_valid), - "invariants_connected": _invariants, + "act1_correct": act1_pred.value == "D", + "act2_result": "approved" if _constraints_all_met else "infeasible", + "act2_decision": parallelism_strategy.value, + "constraint_hit": not _constraints_all_met, + "curriculum_complete": True, } ) - # ── HUD footer ──────────────────────────────────────────────────────────── - _checks_list = [ - ("Training: No OOM", not _oom), - ("Checkpoint Overhead OK", _overhead_ok), - ("P99 < 500ms SLO", _slo_ok), - ("Availability 99.99%", _avail_ok), - ("GDPR ε Compliance", _gdpr_ok), - ("EU AI Act Fairness", _eu_ok), - ("Carbon Neutral 2027", _carbon_ok), + # ── Build constraint status list ───────────────────────────────────────── + _checks = [ + ("Accuracy >= 95%", _accuracy_met), + ("P99 < 200ms", _latency_met), + ("DP eps <= 1", _dp_met), + ("Robustness >= 50%", _adversarial_met), + ("Carbon -40%", _carbon_met), + ("Uptime 99.9%", _fault_tol_met), ] _badge_html = "".join([ @@ -2079,9 +1749,12 @@ def _( font-weight: 600; margin: 3px;"> {'✓' if ok else '❌'} {label} """ - for label, ok in _checks_list + for label, ok in _checks ]) + _arch_status = "APPROVED" if _constraints_all_met else f"INFEASIBLE ({_n_met}/6)" + _status_color = "#4ade80" if _constraints_all_met else "#f87171" + _hud = mo.Html(f"""
- Design Ledger · Chapter v2_17 · Capstone + LAB = V2-17 (CAPSTONE)  ·  + CONTEXT = {context_toggle.value.upper()}  ·  + CURRICULUM COMPLETE
- Planet-Scale Architecture: {_verdict_label} -   {_total_pass}/{_total_checks} constraints passed + Architecture Status: + {_arch_status} +  —  CONSTRAINTS MET: {_n_met}/6
- Annual cost estimate + Act 2 prediction
-
- ${_total_cost_b:.1f}B / yr +
+ Option {act2_pred.value} + {'✓ Correct' if act2_pred.value == 'B' else ''}
-
+
{_badge_html}
- Invariants applied: - {" · ".join(_invariants)} + Model: {model_size_b.value}B params +  ·  + DP ε: {dp_epsilon.value:.1f} +  ·  + Adv weight: {adv_train_weight.value:.2f} +  ·  + Ckpt interval: {checkpoint_interval_min.value} min +  ·  + Flex jobs: {flexible_job_pct.value}% +  ·  + Parallelism: {parallelism_strategy.value} +
+
+ Curriculum complete. Vol I + Vol II Design Ledger saved. The physics doesn't change.
""") @@ -2123,7 +1812,7 @@ def _( # ═══════════════════════════════════════════════════════════════════════════════ -# CURRICULUM SYNTHESIS — THE META-PRINCIPLE +# THE META-PRINCIPLE # ═══════════════════════════════════════════════════════════════════════════════ @@ -2137,19 +1826,21 @@ def _(mo): **physical laws create hard ceilings that no amount of engineering can dissolve.** You cannot wish away the memory wall — HBM bandwidth is determined by signal - physics and pin count. You cannot wish away Amdahl's Law — coordination cost - grows with cluster size regardless of how good your scheduler is. You cannot - wish away Chouldechova's theorem — it follows from the definition of conditional - probability. You cannot wish away Young-Daly — it follows from the calculus of - minimization. You cannot wish away Little's Law — it follows from queueing theory - steady-state. + physics and pin count. You cannot wish away Amdahl's Law — the serial fraction + of your workload caps speedup regardless of cluster size. You cannot wish away + Chouldechova's theorem — it follows from the definition of conditional probability + when base rates differ. You cannot wish away Young-Daly — it follows from the + calculus of minimization under a Poisson failure process. You cannot wish away + Little's Law — it follows from queueing theory steady-state. You cannot make DP + and adversarial robustness simultaneously costless — they are mechanistically + opposed in the same loss landscape. But you *can* navigate these constraints. That is the discipline of ML systems: not finding a way around the physics, but designing systems that respect it. The skilled ML architect does not ask: "How do I avoid the memory wall?" They ask: "Which memory-wall-respecting architecture best satisfies my - throughput, latency, and cost requirements simultaneously?" + throughput, latency, cost, and safety requirements simultaneously?" That is the question this curriculum trained you to ask. """) @@ -2166,12 +1857,12 @@ def _(mo): which constraint to prioritize when they cannot all be satisfied simultaneously. The invariants give you the exact tradeoff surface. Read them. - 2. **The gap between your prediction and reality is your learning.** - If your Systems Intuition Radar shows a weak domain, that is not a failure — - it is a calibration report. The researchers who built the fastest training - systems, the most efficient serving pipelines, and the fairest production - models were the ones who had internalized which invariants bind when, - and why. That intuition is now yours to develop. + 2. **The bottleneck moves with scale — that is the curriculum.** + Memory dominates single-node inference. Communication dominates multi-node + training. Privacy and fairness constraints activate at any scale but are + invisible until you look across populations. The researcher who built the + fastest training systems internalized which invariant binds when, and why. + That intuition is now yours to develop. """) return