Efficient AI Thoughts Checklist

GiteaMirror commented

2026-03-22 15:36:39 -05:00

Owner

Originally created by @18jeffreyma on GitHub (Feb 3, 2025).

Originally assigned to: @profvjreddi, @18jeffreyma on GitHub.

9.2

Good breakdown: maybe diagram this as more of an ancestry?

9.2.1

“Era of Algorithmic Efficiency”: Maybe worth including some examples here on why parallelism and ml systems didnt take off:
Namely, things like decision trees or SVMs were not that easily model parallelized: most works focused on ensemble learning (i.e. learning multiple trees/models in parallel), which one can do with data parallelism and splitting across batches.
Only when deep learning came did we have models that were much easier to shard across model dimension.
“The shift to deep learning: what do you think about bolding some of the key words to make them stand out?
“Modern Era of Algorithmic Efficiency”: I think an additional linkage here is to explain even at large scale datacenters, we need efficiency. Maybe worth briefly touching here on we’re hitting limits of our hardware and need to be more creative in software/efficiency to maximize hardware usage.
Maybe discuss things like general latency or memory requirements briefly (I see the next section goes into depth)
Update Figure 9.2 with some LLMs maybe?

9.2.2

Maybe discuss a bit how even datacenters are energy constrained due to electrical grids (and locality of power), so these problems are important at both large and small scale (i’ll find a nice citation).

9.2.3

Include some citations in 1980-2010 section.
Some bits on data efficiency like https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2022.760085/full
Modern Era: I think worth including some quick discussion on maybe an case study on CommonCrawl and how much raw data is available, but training on all this data is clearly infeasible and we need to be more efficient with our data.
Cite WakeVision, KWS dataset, FineWeb as a nice data efficient postprocessing example.

9.3.2

TODO: Figure 9.4: I’ll fleshout some bits we can include at the intersections.
“Algorithmic Efficiency Reinforces Compute and Data Efficiency”:
Maybe let’s go into what the ideas behind this section are in person

9.3.3

I wonder if this is better structured as a case study system, but unsure how specific we should get into, given this chapter should be general.

9.3.4

A diagram as a recycle loop would be super key to reinforce this section of efficiency -> scalability -> sustainability -> efficiency. Move Figure 9.5 upwards maybe (and the paragraph introducing it).

9.4.2

Maybe replace wording “and” with “versus” to emphasize tradeoff

9.5.1

Maybe something like a radar or triangle plot, where model compute and data are each of the corners as a diagram to include in the prioritization section.

9.5.2

Organization Thoughts:
9.2 should be renamed toHistory over Time
9.3 should be renamed Defining System Efficiency
9.4 stays same
9.5 stays same
9.6: Building a Efficiency-first Mindset (or Efficiency as a First Party Consideration)
9.7 Broader Challenges and Philosophical Questions
9.7.4 Balancing Innovation and Efficiency

Thoughts from our tuesday discussion:

Figure 9.4 is worth fleshing out with concrete directions at the intersections. Perhaps we should add a “Intersections” section right before 9.3.3 with some key examples: I brainstormed a few below
Model + Data Efficiency: understanding how to design architectures with better inductive biases
and that can learn more efficiently from data (i.e. CNNs vs fully connected for vision data)
Compute + Data Efficiency: understanding how to build data pipelines and loading such that accelerators and GPUs are always fed and have work readily available to make training as fast as possible.
Model + Compute Efficiency: choosing model architectures with efficient or constrained numerics in mind (i.e. neural architecture search for edge computing devices).

Originally created by @18jeffreyma on GitHub (Feb 3, 2025). Originally assigned to: @profvjreddi, @18jeffreyma on GitHub. ## 9.2 - [x] Good breakdown: maybe diagram this as more of an ancestry? ## 9.2.1 - [x] “Era of Algorithmic Efficiency”: Maybe worth including some examples here on why parallelism and ml systems didnt take off: Namely, things like decision trees or SVMs were not that easily model parallelized: most works focused on ensemble learning (i.e. learning multiple trees/models in parallel), which one can do with data parallelism and splitting across batches. Only when deep learning came did we have models that were much easier to shard across model dimension. - [x] “The shift to deep learning: what do you think about bolding some of the key words to make them stand out? - [x] “Modern Era of Algorithmic Efficiency”: I think an additional linkage here is to explain even at large scale datacenters, we need efficiency. Maybe worth briefly touching here on we’re hitting limits of our hardware and need to be more creative in software/efficiency to maximize hardware usage. Maybe discuss things like general latency or memory requirements briefly (I see the next section goes into depth) Update Figure 9.2 with some LLMs maybe? ## 9.2.2 - [x] Maybe discuss a bit how even datacenters are energy constrained due to electrical grids (and locality of power), so these problems are important at both large and small scale (i’ll find a nice citation). ## 9.2.3 - [x] Include some citations in 1980-2010 section. Some bits on data efficiency like https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2022.760085/full - [x] Modern Era: I think worth including some quick discussion on maybe an case study on CommonCrawl and how much raw data is available, but training on all this data is clearly infeasible and we need to be more efficient with our data. Cite WakeVision, KWS dataset, FineWeb as a nice data efficient postprocessing example. ## 9.3.2 - [x] TODO: Figure 9.4: I’ll fleshout some bits we can include at the intersections. “Algorithmic Efficiency Reinforces Compute and Data Efficiency”: Maybe let’s go into what the ideas behind this section are in person ## 9.3.3 - [x] I wonder if this is better structured as a case study system, but unsure how specific we should get into, given this chapter should be general. ## 9.3.4 - [x] A diagram as a recycle loop would be super key to reinforce this section of efficiency -> scalability -> sustainability -> efficiency. Move Figure 9.5 upwards maybe (and the paragraph introducing it). ## 9.4.2 - [x] Maybe replace wording “and” with “versus” to emphasize tradeoff ## 9.5.1 - [ ] Maybe something like a radar or triangle plot, where model compute and data are each of the corners as a diagram to include in the prioritization section. ## 9.5.2 - [ ] Organization Thoughts: 9.2 should be renamed toHistory over Time 9.3 should be renamed Defining System Efficiency 9.4 stays same 9.5 stays same 9.6: Building a Efficiency-first Mindset (or Efficiency as a First Party Consideration) 9.7 Broader Challenges and Philosophical Questions 9.7.4 Balancing Innovation and Efficiency Thoughts from our tuesday discussion: - [ ] Figure 9.4 is worth fleshing out with concrete directions at the intersections. Perhaps we should add a “Intersections” section right before 9.3.3 with some key examples: I brainstormed a few below - Model + Data Efficiency: understanding how to design architectures with better inductive biases - and that can learn more efficiently from data (i.e. CNNs vs fully connected for vision data) Compute + Data Efficiency: understanding how to build data pipelines and loading such that accelerators and GPUs are always fed and have work readily available to make training as fast as possible. - Model + Compute Efficiency: choosing model architectures with efficient or constrained numerics in mind (i.e. neural architecture search for edge computing devices).