mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-10 15:49:25 -05:00
Key findings: efficiency never explained (fix needed), compression too rushed (15 min), AllReduce needs numbers-before-formulas, TinyML feels tangential. Inverse Roofline is the surprise hit. AMD engineer caught a wrong MI300X bandwidth number.