fix(benchmarking): address section 12.8.2 reader critique on performance vs energy efficiency

- Add proper citations for processor frequency scaling claims using Koomey et al. 2011
- Fix unreferenced INT8 quantization claims with Jacob et al. CVPR 2018 citation
- Add Esser et al. 2019 reference for advanced quantization techniques
- Improve section flow with better contextual introductions and transitions
- Consolidate redundant quantization discussion into coherent narrative
- Connect performance-energy tradeoffs to physical limitations and benchmarking standards
- Reorganize bibliography entries alphabetically for better maintainability

Resolves concerns about unreferenced claims, abrupt transitions, and lack of context.
This commit is contained in:
Vijay Janapa Reddi
2025-08-24 21:59:26 +02:00
parent 16871b0cdd
commit 9f6d3875fc
2 changed files with 45 additions and 4 deletions

View File

@@ -218,6 +218,18 @@
eprint = {2207.07958},
}
@article{esser2019learned,
title = {Learned Step Size Quantization},
author = {
Esser, Steven K. and McKinstry, Jeffrey L. and Bablani, Deepika and Appuswamy, Rathinakumar and
Modha, Dharmendra S.
},
year = {2019},
journal = {arXiv preprint arXiv:1902.08153},
doi = {10.48550/arXiv.1902.08153},
url = {https://arxiv.org/abs/1902.08153},
}
@article{everingham2010pascal,
title = {The Pascal Visual Object Classes (VOC) Challenge},
author = {
@@ -337,6 +349,21 @@
source = {Crossref},
}
@inproceedings{jacob2018quantization,
title = {Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference},
author = {
Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and
Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry
},
year = {2018},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
publisher = {IEEE},
series = {CVPR '18},
pages = {2704--2713},
doi = {10.1109/CVPR.2018.00286},
url = {https://doi.org/10.1109/CVPR.2018.00286},
}
@inproceedings{janapa2022mlperf,
title = {MLPerf Mobile v2. 0: An Industry-Standard Benchmark Suite for Mobile Machine Learning},
author = {Janapa Reddi, Vijay and others},
@@ -476,6 +503,20 @@
organization = {IEEE},
}
@inproceedings{koomey2011web,
title = {
Web Extra Appendix: Implications of Historical Trends in the Electrical Efficiency of Computing
},
author = {Koomey, Jonathan and Berard, Stephen and Sanchez, Marla and Wong, Henry},
year = {2011},
booktitle = {IEEE Annals of the History of Computing},
publisher = {IEEE Computer Society},
volume = {33},
number = {3},
pages = {46--54},
doi = {10.1109/MAHC.2010.28},
}
@article{krizhevsky2009learning,
title = {Learning multiple layers of features from tiny images},
author = {Krizhevsky, Alex and Hinton, Geoffrey and others},

View File

@@ -1432,13 +1432,13 @@ Support infrastructure, particularly cooling systems, is a major component of to
### Performance vs Energy Efficiency {#sec-benchmarking-ai-performance-vs-energy-efficiency-826e}
A critical consideration in ML system design is the relationship between performance and energy efficiency. Maximizing raw performance often leads to diminishing returns in energy efficiency. For example, increasing processor frequency by 20% might yield only a 5% performance improvement while increasing power consumption by 50%. This non-linear relationship means that the most energy-efficient operating point is often not the highest performing one.
The relationship between computational performance and energy efficiency is one of the most important tradeoffs in modern ML system design. As systems push for higher performance, they often encounter diminishing returns in energy efficiency due to fundamental physical limitations in semiconductor scaling and power delivery [@koomey2011web]. This relationship is particularly evident in processor frequency scaling, where increasing clock frequency by 20% typically yields only modest performance improvements (around 5%) while dramatically increasing power consumption by up to 50%, reflecting the cubic relationship between voltage, frequency, and power consumption.
In many deployment scenarios, particularly in battery-powered devices, finding the optimal balance between performance and energy efficiency is crucial. For instance, reducing model precision from FP32 to INT8 might reduce accuracy by 1-2% but can improve energy efficiency by 3-4x. Similarly, batch processing can improve throughput efficiency at the cost of increased latency.
In deployment scenarios with strict energy constraints, particularly battery-powered edge devices and mobile applications, optimizing this performance-energy tradeoff becomes essential for practical viability. Model optimization techniques offer promising approaches to achieve better efficiency without significant accuracy degradation. Quantization techniques, which reduce numerical precision from floating-point (FP32) to integer representations (INT8), demonstrate this tradeoff effectively. Research shows that INT8 quantization can maintain model accuracy within 1-2% of the original while delivering 3-4x improvements in both inference speed and energy efficiency [@jacob2018quantization].
These tradeoffs span three key dimensions: accuracy, performance, and energy efficiency. Model quantization illustrates this relationship clearly, reducing numerical precision from FP32 to INT8 typically results in a small accuracy drop (1-2%), but it can improve both inference speed and energy efficiency by 3-4x. Similarly, techniques like pruning and model compression require carefully balancing accuracy losses against efficiency gains. Finding the optimal operating point among these three factors depends heavily on deployment requirements; mobile applications might prioritize energy efficiency, while cloud services might optimize for accuracy at the cost of higher power consumption.
These optimization strategies span three interconnected dimensions: accuracy, computational performance, and energy efficiency. Advanced quantization methods, including learned step-size approaches [@esser2019learned], enable fine-tuned control over this tradeoff space. Similarly, techniques like model pruning and compression require careful balancing of accuracy losses against efficiency gains. The optimal operating point among these factors depends heavily on deployment requirements and constraints; mobile applications typically prioritize energy efficiency to extend battery life, while cloud-based services might optimize for accuracy even at higher power consumption costs, leveraging economies of scale and dedicated cooling infrastructure.
As benchmarking methodologies continue to evolve, energy efficiency metrics will play an increasingly central role in AI optimization. Future advancements in sustainable AI benchmarking[^fn-sustainable-ai] will help researchers and engineers design systems that balance performance, power consumption, and environmental impact, ensuring that ML systems operate efficiently without unnecessary energy waste.
As benchmarking methodologies continue to evolve, energy efficiency metrics are becoming increasingly central to AI system evaluation and optimization. The integration of power measurement standards, such as those established in MLPerf Power [@tschand2024mlperf], provides standardized frameworks for comparing energy efficiency across diverse hardware platforms and deployment scenarios. Future advancements in sustainable AI benchmarking[^fn-sustainable-ai] will help researchers and engineers design systems that systematically balance performance, power consumption, and environmental impact, ensuring that ML systems operate efficiently while minimizing unnecessary energy waste and supporting broader sustainability goals.
[^fn-sustainable-ai]: Reducing the environmental impact of machine learning by improving energy efficiency, using renewable energy sources, and designing models that require fewer computational resources.