[PR #1591] [MERGED] fix(staffml): correct cloud-0013 INT4 throughput math and unit error in distractor #11781

New Issue

GiteaMirror · 2026-05-12T19:28:54-05:00

GiteaMirror commented

2026-05-12 19:28:54 -05:00

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1591
Author: @Shashank-Tripathi-07
Created: 4/27/2026
Status: ✅ Merged
Merged: 4/27/2026
Merged by: @profvjreddi

Base: dev ← Head: fix/staffml-cloud-0013-precision-mismatch

📝 Commits (1)

47e64e0 fix(staffml): correct cloud-0013 INT4 throughput math and unit error

📊 Changes

1 file changed (+7 additions, -5 deletions)

View changed files

📝 interviews/vault/questions/cloud/cloud-0013.yaml (+7 -5)

📄 Description

Bug

cloud-0013 (The TPOT Memory Wall) presents an INT4-quantized 70B model but the napkin_math field showed FP16 arithmetic:

Before (wrong): implied ~24 tokens/sec (FP16: 140 GB / 3.35 TB/s)
After (correct): ~96 tokens/sec (INT4: 70B x 0.5 bytes = 35 GB, 35 GB / 3.35 TB/s = 10.4 ms/token)

Separately, distractor option 1 contained a 1000x unit error: "3.35 GB/s" should be "3.35 TB/s".

Changes

interviews/vault/questions/cloud/cloud-0013.yaml
- napkin_math: complete INT4 calculation showing 35 GB model size and ~96 tok/s throughput
- Option 1 (distractor): fixed unit "GB/s" -> "TB/s"
- Option 2 (correct answer): updated from "~24 tokens/sec" to "~96 tokens/sec for this INT4 model"
- correct_index unchanged (still 2)

Question text and scenario are untouched. Only the answer explanation and option text were corrected to match the precision specified in the scenario.

Verification

INT4 math: 70B params x 0.5 bytes/param = 35 GB. H100 HBM bandwidth = 3.35 TB/s. Theoretical throughput = 3.35e12 / 35e9 = 95.7 tok/s, comfortably above the 20 tok/s requirement stated in the scenario.

Related: PR #1590 (cloud-0024 correct_index fix) is a companion StaffML audit fix.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1591 **Author:** [@Shashank-Tripathi-07](https://github.com/Shashank-Tripathi-07) **Created:** 4/27/2026 **Status:** ✅ Merged **Merged:** 4/27/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `fix/staffml-cloud-0013-precision-mismatch` --- ### 📝 Commits (1) - [`47e64e0`](https://github.com/harvard-edge/cs249r_book/commit/47e64e0993cd95c5859b30ff1c2a196bebdacf76) fix(staffml): correct cloud-0013 INT4 throughput math and unit error ### 📊 Changes **1 file changed** (+7 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `interviews/vault/questions/cloud/cloud-0013.yaml` (+7 -5) </details> ### 📄 Description ## Bug `cloud-0013` (The TPOT Memory Wall) presents an INT4-quantized 70B model but the `napkin_math` field showed FP16 arithmetic: - **Before (wrong):** implied ~24 tokens/sec (FP16: 140 GB / 3.35 TB/s) - **After (correct):** ~96 tokens/sec (INT4: 70B x 0.5 bytes = 35 GB, 35 GB / 3.35 TB/s = 10.4 ms/token) Separately, distractor option 1 contained a 1000x unit error: \"3.35 GB/s\" should be \"3.35 TB/s\". ## Changes - `interviews/vault/questions/cloud/cloud-0013.yaml` - `napkin_math`: complete INT4 calculation showing 35 GB model size and ~96 tok/s throughput - Option 1 (distractor): fixed unit \"GB/s\" -> \"TB/s\" - Option 2 (correct answer): updated from \"~24 tokens/sec\" to \"~96 tokens/sec for this INT4 model\" - `correct_index` unchanged (still 2) **Question text and scenario are untouched.** Only the answer explanation and option text were corrected to match the precision specified in the scenario. ## Verification INT4 math: 70B params x 0.5 bytes/param = 35 GB. H100 HBM bandwidth = 3.35 TB/s. Theoretical throughput = 3.35e12 / 35e9 = 95.7 tok/s, comfortably above the 20 tok/s requirement stated in the scenario. **Related:** PR #1590 (cloud-0024 correct_index fix) is a companion StaffML audit fix. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2026-05-12 19:28:54 -05:00

GiteaMirror closed this issue

2026-05-12 19:28:57 -05:00

Sign in to join this conversation.

Branches Tags

dev

feat/mlperf-edu-precondition

gh-pages

vol1/all-final

main

vol1/appendices-final

vol1/ch16-final

vol1/ch15-final

vol1/ch14-final

vol1/ch13-final

vol1/ch11-final

vol1/ch12-final

vol1/ch10-final

vol1/ch9-final

vol1/ch8-final

vol1/ch7-final

vol1/ch6-final

vol1/ch5-final

vol1/ch4-final

vol1/ch3-final

vol1/ch2-final

vol1/frontmater-final

kai/fixing-profile-setting-and-map

chore/staffml-ci-path

fix/callout-flow

vol1/ch10-pass4

vol1/ch9-pass4

vol1/ch8-pass4

vol1/ch7-pass4

vol1/ch6-pass4

vol1/ch5-pass4

vol1/apC-pass3

vol1/ch4-pass4

vol1/ch3-pass4

vol1/ch2-pass4

vol1/ch1-pass4

vol1/frontmatter

vol1/apE-pass3

vol1/apD-pass3

fmt-fix

vol1/ch14-pass3

kai/clarify-community-map-totals

vol1/ch13-pass3

vol1/ch12-pass3

vol1/ch11-pass3

vol1/ch10-pass3

vol1/ch7-pass3

vol1/ch9-pass3

vol1/ch8-pass3

vol1/ch6-pass3

vol1/ch5-pass3

vol1/ch4-pass3

vol1/ch3-pass3

vol1/ch2-pass3

vol1/ch1-pass3

vol1/ch6-pass2

vol1/ch5-pass2

vol1/ch4-pass2

vol1/ch3-pass2

vol1/ch2-pass2

fix/badge-fixes

chore/precommit-cleanup

cleanup/book-validate-paths

fix/staffml-trigger-on-workflow-edits

fix/staffml-reusable-concurrency

feat/container-preflight-urls

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#11781