---
title: "Sharp Shot"
subtitle: "Compress a model before the target blurs."
description: "A browser mini-game from MLSysBook Playground. Tune per-layer precision dials, watch the accuracy meter respond, then shoot a target whose blur, jitter, and drift reflect the noise each layer introduces. Teaches mixed-precision quantization through the asymmetry that makes it real."
page-layout: article
format:
html:
include-in-header:
- text: |
---
```{=html}
bits 192 / 96 · acc 100% · shots 10 · score 01-6 cycles a layer · space to fire · R retry
```
## How to play
A target sits downrange. Your weapon has six per-layer precision dials (embedding, attn.1, ffn.1, attn.2, ffn.2, output) and a total **bit budget of 96**. Every layer starts at fp32 (32 bits) — well over budget. Lowering precision frees bits, but it visually degrades the sight in three different ways depending on which layer you compressed:
- **Edge layers** (embedding + output) at **int4**: the target **drifts** away from where it appears — systematic bias. You aim at the bullseye, but the true target has shifted. This is the [LLM.int8 cliff](https://arxiv.org/abs/2208.07339) (Dettmers et al., 2022) — embeddings and output heads collapse hard at very low precision.
- **Attention layers** at low precision: the target **jitters** — softmax amplifies numerical noise.
- **FFN layers** at low precision: the target **blurs** — reduced contrast, most robust to aggressive quantization.
The **estimated accuracy meter** updates live as you cycle dials, so you can balance budget vs. accuracy *before* you fire.
You have **10 shots**. Each shot scores **3 / 2 / 1 / 0** points by zone (bullseye / inner / outer / miss). Reach **18 points to ship the model**. If you miss completely, the game briefly reveals where the *true* target was so you can see how far your sight was misaligned by the configuration you chose.
## The Systems Concept
Production quantization (GPTQ, AWQ, SmoothQuant) spends most of its time navigating exactly this asymmetry: most of a model can be aggressively quantized, but a few layers — typically the first and last — cannot. Mixed-precision bit-allocation algorithms like [HAWQ](https://arxiv.org/abs/1905.03696) (Dong et al. 2019) search for the right per-layer budget automatically. You just did a 30-second version by eye.
---
*Part of [MLSysBook Playground](/games/). Found a bug? [Report an issue](https://github.com/harvard-edge/cs249r_book/issues/new?labels=bug&title=Bug+in+Game).*