mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 02:28:25 -05:00
[PR #1631] fix(labs): correct cold start time in lab 13 Part D #9237
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/harvard-edge/cs249r_book/pull/1631
Author: @Shashank-Tripathi-07
Created: 5/3/2026
Status: 🔄 Open
Base:
dev← Head:fix/lab13-cold-start-answer📝 Commits (1)
f763d28fix(labs): correct cold start answer in lab 13 Part D📊 Changes
1 file changed (+4 additions, -3 deletions)
View changed files
📝
labs/vol1/lab_13_model_serving.py(+4 -3)📄 Description
Summary
Part D asks: "Auto-scaling Llama-2 70B during traffic spike. First-user wait?"
The answer option C said "~15 seconds (loading 140 GB over PCIe Gen5)" but this is wrong. The code uses NVMe sequential read (7 GB/s) as the storage source, not PCIe Gen5 (64 GB/s). The actual bottleneck is NVMe:
The dynamic display (
_t_total) already showed ~26s, but the answer option and the "Correct" callout both said "~15 seconds" -- a visible contradiction.Changes
"~15 seconds (loading 140 GB over PCIe Gen5)"→"~26 seconds (NVMe read bottleneck + deserialization)""15s"→"26s"Test plan
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.