mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-22 05:53:13 -05:00
[PR #1591] [MERGED] fix(staffml): correct cloud-0013 INT4 throughput math and unit error in distractor #11781
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/harvard-edge/cs249r_book/pull/1591
Author: @Shashank-Tripathi-07
Created: 4/27/2026
Status: ✅ Merged
Merged: 4/27/2026
Merged by: @profvjreddi
Base:
dev← Head:fix/staffml-cloud-0013-precision-mismatch📝 Commits (1)
47e64e0fix(staffml): correct cloud-0013 INT4 throughput math and unit error📊 Changes
1 file changed (+7 additions, -5 deletions)
View changed files
📝
interviews/vault/questions/cloud/cloud-0013.yaml(+7 -5)📄 Description
Bug
cloud-0013(The TPOT Memory Wall) presents an INT4-quantized 70B model but thenapkin_mathfield showed FP16 arithmetic:Separately, distractor option 1 contained a 1000x unit error: "3.35 GB/s" should be "3.35 TB/s".
Changes
interviews/vault/questions/cloud/cloud-0013.yamlnapkin_math: complete INT4 calculation showing 35 GB model size and ~96 tok/s throughputcorrect_indexunchanged (still 2)Question text and scenario are untouched. Only the answer explanation and option text were corrected to match the precision specified in the scenario.
Verification
INT4 math: 70B params x 0.5 bytes/param = 35 GB. H100 HBM bandwidth = 3.35 TB/s. Theoretical throughput = 3.35e12 / 35e9 = 95.7 tok/s, comfortably above the 20 tok/s requirement stated in the scenario.
Related: PR #1590 (cloud-0024 correct_index fix) is a companion StaffML audit fix.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.