mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-30 17:48:27 -05:00
Replaces abbreviated 'ML Systems' with 'Machine Learning Systems' across volume titles, part openers, frontmatter, summaries, docs, and landing page for consistency with MIT Press submission standards.
31 lines
2.2 KiB
Plaintext
31 lines
2.2 KiB
Plaintext
# Author's Note {.unnumbered}
|
|
|
|
::: {style="font-style: italic;"}
|
|
|
|
Volume II: Advanced Machine Learning Systems addresses the challenges of building machine learning systems at production scale. While foundational texts often focus on single-machine systems, this volume teaches you to build and operate them across distributed infrastructure.
|
|
|
|
The transition from single machine to distributed systems represents one of the most significant challenges in ML engineering. Models that train in hours on a single GPU may require weeks without proper parallelization strategies. Inference systems that work perfectly in development may fail catastrophically under production load. This volume addresses these challenges systematically.
|
|
|
|
**What This Volume Covers**
|
|
|
|
The content progresses through four parts that extend foundational concepts:
|
|
|
|
- **Foundations of Scale** establishes the infrastructure, storage, and communication patterns required when ML systems span multiple machines
|
|
- **Distributed Training** develops parallelism strategies, fault tolerance mechanisms, and inference optimization for systems at unprecedented scale
|
|
- **Production Concerns** addresses the real world complexity of edge intelligence, security, privacy, and robust system design
|
|
- **Responsible AI at Scale** explores sustainable computing, beneficial applications, and the emerging frontiers that will shape the future
|
|
|
|
**Prerequisites**
|
|
|
|
This volume assumes familiarity with foundational concepts including ML workflows, neural network architectures, optimization techniques, and deployment fundamentals. Readers new to ML systems should establish this background first.
|
|
|
|
**A Collaborative Effort**
|
|
|
|
This volume has been shaped by contributions from students, researchers, and practitioners worldwide. The distributed systems and production challenges covered here reflect hard won lessons from deploying ML at scale. I am grateful to everyone who has shared their expertise to make this knowledge accessible.
|
|
|
|
As AI systems grow in capability and impact, understanding how to build them responsibly at scale becomes increasingly important. I hope this volume equips you with both the technical skills and ethical framework to contribute positively to this field.
|
|
|
|
— Prof. Vijay Janapa Reddi
|
|
|
|
:::
|