Files
cs249r_book/mlsysim/docs/tutorials/02_differential_explainer.qmd
Vijay Janapa Reddi 73f2906a38 refactor(mlsysim): core refactor with provenance, DSE, and docs updates
Remove pedagogy module, add provenance tracking and design space
exploration. Update evaluation engine, pipeline callbacks, and
documentation including new tutorials.
2026-03-21 08:31:34 -04:00

96 lines
3.5 KiB
Plaintext

---
title: "The Differential Explainer"
subtitle: "Automated 'Why?' analysis for hardware upgrades."
description: "Learn how to use the DifferentialExplainer to automatically compare two configurations and generate a written explanation of the performance delta."
categories: ["analysis", "intermediate"]
---
## The Question
When you run a simulation comparing an A100 to an H100, the output might say:
- A100 Latency: 11.0 ms
- H100 Latency: 8.0 ms
The speedup is 1.4x. But the hardware sheet says the H100 has 3.2x more FLOP/s! **How do we automatically explain this discrepancy to a user or a stakeholder without manually digging through the formulas?**
::: {.callout-note}
## What You Will Learn
- **Compare** two system evaluations automatically.
- **Generate** a human-readable explanation of why a speedup did (or didn't) match hardware specs.
- **Identify** "Regime Shifts" where an upgrade fundamentally changes the bottleneck.
:::
## 1. Setup
Import the necessary modules. We will use the standard `Engine` to get our baseline and proposed profiles, and the new `DifferentialExplainer` to compare them.
```python
import mlsysim
from mlsysim.core.engine import Engine
from mlsysim.core.explainers import DifferentialExplainer
```
## 2. A Memory-Bound Upgrade (The Disappointment)
Let's test the classic scenario: upgrading hardware for LLM Inference at a low batch size.
```python
model = mlsysim.Models.Language.Llama3_8B
# Get our two profiles
prof_a100 = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.A100, batch_size=1)
prof_h100 = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.H100, batch_size=1)
# Ask the explainer what happened
explanation = DifferentialExplainer.compare_performance(
baseline=prof_a100,
proposal=prof_h100
)
print(explanation)
```
**Output:**
```text
📊 Differential Analysis: Proposal vs. Baseline
• Speedup: 1.39x
• Baseline Regime: Memory Bound
• Proposal Regime: Memory Bound
Analysis: The workload remained Memory Bound. The speedup is constrained strictly by the ratio of HBM bandwidth between the two configurations. Any additional compute capacity (FLOP/s) in the proposal was left unutilized.
```
## 3. A Regime Shift (The Breakthrough)
What happens if we increase the batch size significantly?
```python
# At batch size 256, the A100 is struggling with compute, but the H100 has plenty
prof_a100_batch = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.A100, batch_size=256)
prof_h100_batch = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.H100, batch_size=256)
explanation_batch = DifferentialExplainer.compare_performance(
baseline=prof_a100_batch,
proposal=prof_h100_batch
)
print(explanation_batch)
```
**Output:**
```text
📊 Differential Analysis: Proposal vs. Baseline
• Speedup: 2.65x
• Baseline Regime: Compute Bound
• Proposal Regime: Compute Bound
Analysis: The workload remained Compute Bound. The speedup is constrained strictly by the ratio of peak arithmetic throughput (FLOP/s) between the two configurations. Additional memory bandwidth was not the limiting factor.
```
## What You Learned
- **The Differential Explainer** takes the cognitive load off the user by explicitly stating *why* an upgrade behaved the way it did.
- It detects **Regime Shifts**, helping you realize when a hardware upgrade actually solved your bottleneck.
- This tool is perfect for embedding into CI/CD pipelines (e.g., leaving a comment on a GitHub PR explaining why a new model architecture will slow down production).