[PR #1591] [MERGED] fix(staffml): correct cloud-0013 INT4 throughput math and unit error in distractor #11781

Closed
opened 2026-05-12 19:28:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1591
Author: @Shashank-Tripathi-07
Created: 4/27/2026
Status: Merged
Merged: 4/27/2026
Merged by: @profvjreddi

Base: devHead: fix/staffml-cloud-0013-precision-mismatch


📝 Commits (1)

  • 47e64e0 fix(staffml): correct cloud-0013 INT4 throughput math and unit error

📊 Changes

1 file changed (+7 additions, -5 deletions)

View changed files

📝 interviews/vault/questions/cloud/cloud-0013.yaml (+7 -5)

📄 Description

Bug

cloud-0013 (The TPOT Memory Wall) presents an INT4-quantized 70B model but the napkin_math field showed FP16 arithmetic:

  • Before (wrong): implied ~24 tokens/sec (FP16: 140 GB / 3.35 TB/s)
  • After (correct): ~96 tokens/sec (INT4: 70B x 0.5 bytes = 35 GB, 35 GB / 3.35 TB/s = 10.4 ms/token)

Separately, distractor option 1 contained a 1000x unit error: "3.35 GB/s" should be "3.35 TB/s".

Changes

  • interviews/vault/questions/cloud/cloud-0013.yaml
    • napkin_math: complete INT4 calculation showing 35 GB model size and ~96 tok/s throughput
    • Option 1 (distractor): fixed unit "GB/s" -> "TB/s"
    • Option 2 (correct answer): updated from "~24 tokens/sec" to "~96 tokens/sec for this INT4 model"
    • correct_index unchanged (still 2)

Question text and scenario are untouched. Only the answer explanation and option text were corrected to match the precision specified in the scenario.

Verification

INT4 math: 70B params x 0.5 bytes/param = 35 GB. H100 HBM bandwidth = 3.35 TB/s. Theoretical throughput = 3.35e12 / 35e9 = 95.7 tok/s, comfortably above the 20 tok/s requirement stated in the scenario.

Related: PR #1590 (cloud-0024 correct_index fix) is a companion StaffML audit fix.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1591 **Author:** [@Shashank-Tripathi-07](https://github.com/Shashank-Tripathi-07) **Created:** 4/27/2026 **Status:** ✅ Merged **Merged:** 4/27/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `fix/staffml-cloud-0013-precision-mismatch` --- ### 📝 Commits (1) - [`47e64e0`](https://github.com/harvard-edge/cs249r_book/commit/47e64e0993cd95c5859b30ff1c2a196bebdacf76) fix(staffml): correct cloud-0013 INT4 throughput math and unit error ### 📊 Changes **1 file changed** (+7 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `interviews/vault/questions/cloud/cloud-0013.yaml` (+7 -5) </details> ### 📄 Description ## Bug `cloud-0013` (The TPOT Memory Wall) presents an INT4-quantized 70B model but the `napkin_math` field showed FP16 arithmetic: - **Before (wrong):** implied ~24 tokens/sec (FP16: 140 GB / 3.35 TB/s) - **After (correct):** ~96 tokens/sec (INT4: 70B x 0.5 bytes = 35 GB, 35 GB / 3.35 TB/s = 10.4 ms/token) Separately, distractor option 1 contained a 1000x unit error: \"3.35 GB/s\" should be \"3.35 TB/s\". ## Changes - `interviews/vault/questions/cloud/cloud-0013.yaml` - `napkin_math`: complete INT4 calculation showing 35 GB model size and ~96 tok/s throughput - Option 1 (distractor): fixed unit \"GB/s\" -> \"TB/s\" - Option 2 (correct answer): updated from \"~24 tokens/sec\" to \"~96 tokens/sec for this INT4 model\" - `correct_index` unchanged (still 2) **Question text and scenario are untouched.** Only the answer explanation and option text were corrected to match the precision specified in the scenario. ## Verification INT4 math: 70B params x 0.5 bytes/param = 35 GB. H100 HBM bandwidth = 3.35 TB/s. Theoretical throughput = 3.35e12 / 35e9 = 95.7 tok/s, comfortably above the 20 tok/s requirement stated in the scenario. **Related:** PR #1590 (cloud-0024 correct_index fix) is a companion StaffML audit fix. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-12 19:28:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#11781