[PR #1784] fix(mlsysim): correct B200/NVL72 TFLOP constants (2x error in FP16/FP8/INT4/FP4) #15735

Open
opened 2026-05-20 14:05:00 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1784
Author: @Shashank-Tripathi-07
Created: 5/18/2026
Status: 🔄 Open

Base: devHead: fix/mlsysim-formula-audit


📝 Commits (3)

  • fbb3f3d fix(site): add nav-footer and dropdown-menu dark mode selectors
  • 0b1208c fix(kits): correct hardware specs for Raspberry Pi and Nicla Vision
  • 43ddd99 fix(mlsysim): correct B200 and GB200 NVL72 TFLOP constants by 2x

📊 Changes

6 files changed (+50 additions, -15 deletions)

View changed files

📝 kits/contents/platforms.qmd (+1 -1)
📝 kits/contents/raspi/raspi.qmd (+3 -2)
📝 kits/contents/raspi/setup/setup.qmd (+1 -1)
📝 mlsysim/mlsysim/core/constants.py (+7 -7)
📝 shared/config/footer-site.yml (+1 -1)
📝 shared/styles/_site-dark.scss (+37 -3)

📄 Description

Summary

  • B200_FLOPS_FP16_TENSOR was stored as 2250 TFLOPs -- half the correct value. The NVIDIA Blackwell datasheet lists 4500 TFLOPs as the dense FP16/BF16 value (with sparsity at 9000).
  • All derived B200 precision constants and the GB200 NVL72 aggregate values (72 x B200) inherited the same 2x underestimate.

Changes

Constant Before After Source
B200_FLOPS_FP16_TENSOR 2250 TFLOPs 4500 TFLOPs NVIDIA Blackwell Architecture datasheet
B200_FLOPS_FP16_SPARSE 4500 TFLOPs 9000 TFLOPs 2x dense
B200_FLOPS_FP8_TENSOR 4500 TFLOPs 9000 TFLOPs NVIDIA datasheet
B200_FLOPS_INT4 9000 TFLOPs 18000 TFLOPs NVIDIA datasheet
NVL72_FLOPS_FP16_TENSOR 162 PFLOPs 324 PFLOPs 72 x 4.5 PFLOPS
NVL72_FLOPS_FP8_TENSOR 324 PFLOPs 648 PFLOPs 72 x 9 PFLOPS
NVL72_FLOPS_FP4_TENSOR 720 PFLOPs 1440 PFLOPs 72 x 20 PFLOPS

Impact

The B200 hardware registry node uses B200_FLOPS_FP16_TENSOR as its peak_flops. Any roofline simulation, throughput estimate, or TCO calculation using Hardware.B200 or Hardware.NVL72 was underestimating peak compute by 2x.

The other key GPU constants (H100, A100, V100, H200) were verified correct against their datasheets.

Test plan

  • cd mlsysim && python -m pytest tests/ -q -- all 424 tests pass (verified locally)
  • Spot-check: Hardware.B200.peak_flops returns 4500 TFLOPs
  • Spot-check: Hardware.NVL72.peak_flops returns 324 PFLOPs

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1784 **Author:** [@Shashank-Tripathi-07](https://github.com/Shashank-Tripathi-07) **Created:** 5/18/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `fix/mlsysim-formula-audit` --- ### 📝 Commits (3) - [`fbb3f3d`](https://github.com/harvard-edge/cs249r_book/commit/fbb3f3d1638fd7b123d885c9cbcd1cbd7e8257cc) fix(site): add nav-footer and dropdown-menu dark mode selectors - [`0b1208c`](https://github.com/harvard-edge/cs249r_book/commit/0b1208c7329366be0a8a7aef24bb00fddcf57595) fix(kits): correct hardware specs for Raspberry Pi and Nicla Vision - [`43ddd99`](https://github.com/harvard-edge/cs249r_book/commit/43ddd9961720bd730a1b6aebabfc3580c5119aa6) fix(mlsysim): correct B200 and GB200 NVL72 TFLOP constants by 2x ### 📊 Changes **6 files changed** (+50 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `kits/contents/platforms.qmd` (+1 -1) 📝 `kits/contents/raspi/raspi.qmd` (+3 -2) 📝 `kits/contents/raspi/setup/setup.qmd` (+1 -1) 📝 `mlsysim/mlsysim/core/constants.py` (+7 -7) 📝 `shared/config/footer-site.yml` (+1 -1) 📝 `shared/styles/_site-dark.scss` (+37 -3) </details> ### 📄 Description ## Summary - `B200_FLOPS_FP16_TENSOR` was stored as 2250 TFLOPs -- half the correct value. The NVIDIA Blackwell datasheet lists **4500 TFLOPs** as the dense FP16/BF16 value (with sparsity at 9000). - All derived B200 precision constants and the GB200 NVL72 aggregate values (72 x B200) inherited the same 2x underestimate. ## Changes | Constant | Before | After | Source | |---|---|---|---| | `B200_FLOPS_FP16_TENSOR` | 2250 TFLOPs | 4500 TFLOPs | NVIDIA Blackwell Architecture datasheet | | `B200_FLOPS_FP16_SPARSE` | 4500 TFLOPs | 9000 TFLOPs | 2x dense | | `B200_FLOPS_FP8_TENSOR` | 4500 TFLOPs | 9000 TFLOPs | NVIDIA datasheet | | `B200_FLOPS_INT4` | 9000 TFLOPs | 18000 TFLOPs | NVIDIA datasheet | | `NVL72_FLOPS_FP16_TENSOR` | 162 PFLOPs | 324 PFLOPs | 72 x 4.5 PFLOPS | | `NVL72_FLOPS_FP8_TENSOR` | 324 PFLOPs | 648 PFLOPs | 72 x 9 PFLOPS | | `NVL72_FLOPS_FP4_TENSOR` | 720 PFLOPs | 1440 PFLOPs | 72 x 20 PFLOPS | ## Impact The `B200` hardware registry node uses `B200_FLOPS_FP16_TENSOR` as its `peak_flops`. Any roofline simulation, throughput estimate, or TCO calculation using `Hardware.B200` or `Hardware.NVL72` was underestimating peak compute by 2x. The other key GPU constants (H100, A100, V100, H200) were verified correct against their datasheets. ## Test plan - [ ] `cd mlsysim && python -m pytest tests/ -q` -- all 424 tests pass (verified locally) - [ ] Spot-check: `Hardware.B200.peak_flops` returns 4500 TFLOPs - [ ] Spot-check: `Hardware.NVL72.peak_flops` returns 324 PFLOPs --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-20 14:05:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#15735