--- title: "Sharp Shot" subtitle: "Compress a model before the target blurs." description: "A browser mini-game from MLSysBook Playground. Tune per-layer precision dials, watch the accuracy meter respond, then shoot a target whose blur, jitter, and drift reflect the noise each layer introduces. Teaches mixed-precision quantization through the asymmetry that makes it real." page-layout: article format: html: include-in-header: - text: | --- ```{=html}
bits 192 / 96 · acc 100% · shots 10 · score 0 1-6 cycles a layer · space to fire · R retry
``` ## How to play A target sits downrange. Your weapon has six per-layer precision dials (embedding, attn.1, ffn.1, attn.2, ffn.2, output) and a total **bit budget of 96**. Every layer starts at fp32 (32 bits) — well over budget. Lowering precision frees bits, but it visually degrades the sight in three different ways depending on which layer you compressed: - **Edge layers** (embedding + output) at **int4**: the target **drifts** away from where it appears — systematic bias. You aim at the bullseye, but the true target has shifted. This is the [LLM.int8 cliff](https://arxiv.org/abs/2208.07339) (Dettmers et al., 2022) — embeddings and output heads collapse hard at very low precision. - **Attention layers** at low precision: the target **jitters** — softmax amplifies numerical noise. - **FFN layers** at low precision: the target **blurs** — reduced contrast, most robust to aggressive quantization. The **estimated accuracy meter** updates live as you cycle dials, so you can balance budget vs. accuracy *before* you fire. You have **10 shots**. Each shot scores **3 / 2 / 1 / 0** points by zone (bullseye / inner / outer / miss). Reach **18 points to ship the model**. If you miss completely, the game briefly reveals where the *true* target was so you can see how far your sight was misaligned by the configuration you chose. ## The Systems Concept Production quantization (GPTQ, AWQ, SmoothQuant) spends most of its time navigating exactly this asymmetry: most of a model can be aggressively quantized, but a few layers — typically the first and last — cannot. Mixed-precision bit-allocation algorithms like [HAWQ](https://arxiv.org/abs/1905.03696) (Dong et al. 2019) search for the right per-layer budget automatically. You just did a 30-second version by eye. --- *Part of [MLSysBook Playground](/games/). Found a bug? [Report an issue](https://github.com/harvard-edge/cs249r_book/issues/new?labels=bug&title=Bug+in+Game).*