mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 18:18:42 -05:00

Files

Vijay Janapa Reddi 152b8630dc fix(ci): clear all 8 failing pre-commit hooks on dev (#1413 )

* fix(content): clear two mitpress-above-below pre-commit failures

The "📚 Book · ✅ Validate (Dev)" workflow has been failing on dev for
8+ consecutive runs because the mitpress-above-below pre-commit hook
flags spatial references like "above"/"below" inside body prose and
figure captions (the MIT Press style guide wants @sec-/@fig- cross-refs
or "earlier"/"later" instead). Two pre-existing violations were tripping
the hook on every push:

  - book/quarto/contents/vol1/responsible_engr/responsible_engr.qmd:1604
    fig-cap for fig-data-governance-pillars said "obligations discussed
    below: privacy, security, compliance, and transparency" — but those
    four obligations are *immediately* listed in the same caption, so
    "discussed below" was redundant. Reworded to "obligations of
    privacy, security, compliance, and transparency …".

  - book/quarto/contents/vol2/network_fabrics/network_fabrics.qmd:1217
    fig-cap for fig-congestion-cascade said "the PFC backpressure
    cascades described below." Reworded to "described later in this
    section." which is what the hook wants.

After our 4 release-prep merges (PR-1/2/7/12) cleaned up the other
hook failures (spelling, bibtex tidy, pipe tables, contractions,
mitpress-vs-period, …), this was the last remaining failing hook.
Verified locally:

  pre-commit run mitpress-above-below --all-files
  MIT Press: No above/below spatial refs (use cross-refs).....Passed

These are pure copy-edits to figure captions; no semantic change to
the diagrams or surrounding text.

* fix(check-internal-links): suppress 4 categories of false positives

The Tier 1 link checker (shipped in PR #1404) was over-eager and
flagged author content as broken in four documented patterns:

1. TikZ source inside HTML comments. Link regex matched `\node[mycycle](B1)`
   as a Markdown link `[mycycle](B1)`. Fix: strip `<!-- ... -->` bodies
   before scanning, preserving line/column offsets so any *real* failure
   we report stays accurate.
2. Quarto cross-references like `[Foo](@sec-bar)`, `@fig-x`, `@tbl-y`.
   These resolve through the project xref index at render time, not the
   filesystem; book/binder owns that validation. Fix: skip targets whose
   first token is `@sec-/@fig-/@tbl-/@eq-/@lst-/@thm-/@cor-/@def-/@exr-/
   @exm-/@prp-`.
3. Uppercase URL schemes (`HTTPS://`, `HTTP://`) — common after mobile
   auto-capitalize or copied citations. Fix: case-insensitive prefix
   match for the EXTERNAL_SCHEMES tuple.
4. GitHub-style emoji-prefix slugs in `.md` READMEs (e.g.
   `## 🎯 20 Progressive Modules` produces anchor `#-20-progressive-modules`
   on github.com, but Pandoc would slugify to `progressive-modules`).
   Fix: register both Pandoc-style and GitHub-style slugs as valid
   anchors so neither rendering target trips the checker.

Drops repo-wide broken-link count from 150 → 84 (false positives only;
no real link rot is masked). Real rot is fixed in a separate commit so
the checker improvement can be reviewed independently.

* fix(content): repair internal-link rot across 10 files

Concrete link rot the new checker (PR #1404) surfaced once its false
positives were cleared. None of these are stylistic; each link points
at a path or anchor that does not exist.

- README/README_{zh,ja,ko}.md (24 links): translation files live in
  README/ so paths to repo-root targets need a `../` prefix
  (`book/README.md` -> `../book/README.md`, etc.).
- mlsysim/docs/contributing.qmd (21 links): `../slides/...` pointed
  inside `mlsysim/`; the slides root is two levels up
  (`../../slides/...`).
- mlsysim/docs/cli-reference.qmd: `getting-started.qmd#bring-your-own-yaml-byoy`
  removed; retarget to `#defining-custom-models` (closest surviving
  section about user-supplied model specs).
- mlsysim/docs/for-engineers.qmd, for-instructors.qmd:
  `solver-guide.qmd#extending-mlsysim` no longer exists; retarget to
  `#writing-a-custom-solver` (the surviving custom-solver guide).
- book/tools/scripts/README.md: `../docs/BINDER.md` resolved to
  `book/tools/docs/BINDER.md` (nonexistent); the file actually lives
  at `book/docs/BINDER.md`, which is `../../docs/BINDER.md` from here.
- book/quarto/contents/frontmatter/index.qmd:
  `about.qmd#about-the-book-unnumbered` anchor was removed when the
  About heading was simplified; drop the anchor so the link lands at
  the top of the page (which IS the About section).
- tinytorch/datasets/tinytalks/README.md: `scripts/README.md` was
  never created; point at the directory listing instead.

* chore(pre-commit): exclude 3 forward-looking files from internal-link checker

Three files reference content that does not (yet) exist on the
filesystem; the references are intentional rather than rot, so they
should not block CI:

- labs/index.qmd: lists the 33 planned labs (vol1/lab_00..lab_16,
  vol2/lab_01..lab_16) as a roadmap. Links go live as each lab ships.
  De-linking now would lose the visual roadmap. When a lab lands the
  exclusion narrows naturally on its own.
- labs/PROTOCOL.md, labs/TEMPLATE.md: internal authoring docs that
  reference `../.claude/docs/labs/{PROTOCOL,TEMPLATE}.md`. The
  `.claude/` tree is per-worktree and not always present at the same
  relative path; these are author-tooling refs, not user-facing.

Net effect: the link checker is now green on a clean checkout. The
exclude block uses comments per existing convention so the rationale
is discoverable from the config alone.

* fix(content): clear codespell, contractions, and vs. pre-commit failures

Three pre-existing pre-commit hooks were failing on the dev branch
prior to the release-prep merges. Each is a small content normalization:

- codespell (2): re-declares -> redeclares (book/quarto/config/shared/README.md);
  unparseable -> unparsable (handled in the check-internal-links rewrite).
- contractions (2):
  * socratiq/socratiq.qmd callout: "If you're" -> "If you are".
  * nn_architectures fig-alt for the attention-visualization figure:
    "didn't" -> "did not". Alt-text is descriptive prose for screen
    readers, not a verbatim transcription of pixels, so expanding the
    contraction matches MIT Press style without changing the figure
    itself.
- mitpress-vs-period (6): bare `vs` -> `vs.` per MIT Press 2026 §10.5
  in benchmarking.qmd, distributed_training.qmd (x3 across two Python
  docstrings rendered in code listings), fault_tolerance.qmd, and
  inference.qmd. Code-listing strings are visible prose in the rendered
  PDF, so the rule applies there as well.

* chore: bibtex-tidy auto-format outputs

Outputs of the bibtex-tidy pre-commit hook (which auto-fixes its own
input). Picked up here so that running pre-commit on a clean checkout
no longer reports a "files were modified" failure for the same files
on every invocation. Pure formatting; no entry semantics changed.

2026-04-20 12:58:28 -04:00

20 KiB

Raw Permalink Blame History

머신러닝 시스템

인공지능 시스템 엔지니어링의 원리와 실천

English • 中文 • 日本語 • 한국어

📘 Volume I • 📙 Volume II (Summer 2026) • 🔥 TinyTorch • 🔮 MLSys·im • 🌐 생태계

📚 2026년 MIT Press에서 하드커버 출판 예정

미션

세상은 AI 시스템을 급히 만들고 있습니다. 하지만 엔지니어링은 부족합니다.

그 격차가 바로 우리가 말하는 AI 엔지니어링입니다.

AI 엔지니어링은 실제 세계에서 효율적이고, 신뢰할 수 있으며, 안전하고, 견고한 지능형 시스템을 구축하는 학문입니다. 단순히 모델만 만드는 것이 아니라요.

우리의 미션: 소프트웨어 엔지니어링과 컴퓨터 엔지니어링에 이어 AI 엔지니어링을 기본 학문으로 자리매김하도록, 엔드‑투‑엔드 지능형 시스템을 설계·구축·평가하는 방법을 가르치는 것입니다. AI의 장기적 영향은 아이디어를 실제 작동하고 신뢰할 수 있는 시스템으로 바꿀 수 있는 엔지니어에 의해 형성될 것입니다.

이 저장소에 포함된 내용

이 저장소는 AI 시스템 엔지니어링을 위한 오픈 학습 스택입니다.

텍스트북 소스, TinyTorch, 하드웨어 키트, 그리고 원리와 실행 가능한 코드·실제 장치를 연결하는 향후 공동 실험(co‑labs) 등을 포함합니다.

시작하기

목표에 따라 경로를 선택하세요.

READ 텍스트북부터 시작하세요. Chapter 1과 Benchmarking chapter을 살펴보세요.

BUILD Getting Started guide를 따라 TinyTorch를 시작하세요. Module 01부터 시작해 CNN, Transformer, MLPerf 벤치마크까지 진행합니다.

DEPLOY 하드웨어 키트를 선택해 Arduino, Raspberry Pi 등 엣지 디바이스에서 실험을 진행하세요.

CONNECT Discussions에서 인사해 주세요. 가능한 한 빠르게 답변드리겠습니다.

학습 스택

아래 그림은 텍스트북이 실습 및 배포와 어떻게 연결되는지를 보여줍니다. 텍스트북을 읽고, 원하는 경로를 선택하세요:

┌───────────────────────────────────────────────────────────────────────────────┐
│                                                                               │
│                           MACHINE LEARNING SYSTEMS                            │
│                              Read the Textbook                                │
│                                                                               │
│                    Theory • Concepts • Best Practices                         │
│                                                                               │
└───────────────────────────────────────┬───────────────────────────────────────┘
                                        │
                          ┌─────────────┼─────────────┐
                          │             │             │
                          ▼             ▼             ▼
┌───────────────────────────────────────────────────────────────────────────────┐
│                            HANDS‑ON ACTIVITIES                                │
│                           (pick one or all)                                   │
│                                                                               │
│     ┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐     │
│     │                 │      │                 │      │                 │     │
│     │    SOFTWARE     │      │    TINYTORCH    │      │    HARDWARE     │     │
│     │    CO‑LABS      │      │    FRAMEWORK    │      │      LABS       │     │
│     │                 │      │                 │      │                 │     │
│     │ EXPLORE         │      │ BUILD           │      │ DEPLOY          │     │
│     │                 │      │                 │      │                 │     │
│     │ Run controlled  │      │ Understand      │      │ Engineer under  │     │
│     │ experiments on  │      │ frameworks by   │      │ real constraints│     │
│     │ latency, memory,│      │ implementing    │      │ memory, power,  │     │
│     │ energy, cost    │      │ them            │      │ timing, safety  │     │
│     │                 │      │                 │      │                 │
│     │ (coming 2026)   │      │                 │      │ Arduino, Pi     │
│     └─────────────────┘      └─────────────────┘      └─────────────────┘     │
│                                                                               │
│           EXPLORE                  BUILD                   DEPLOY             │
│                                                                               │
└───────────────────────────────────────┬───────────────────────────────────────┘
                                        │
                                        ▼
┌───────────────────────────────────────────────────────────────────────────────┐
│                                                                               │
│                                  AI OLYMPICS                                  │
│                                 Prove Mastery                                 │
│                                                                               │
│       Compete across all tracks • University teams • Public leaderboards      │
│                                                                               │
│                                (coming 2026)                                  │
│                                                                               │
└───────────────────────────────────────────────────────────────────────────────┘

	Component	What You Do	Link
READ	📖 텍스트북	ML 시스템 개념 이해	book/
EXPLORE	🔮 Software Co‑Labs	레이턴시·메모리·에너지·비용 실험	Coming 2026
BUILD	🔥 TinyTorch	프레임워크 구현을 직접 경험	tinytorch/
DEPLOY	🔧 Hardware Kits	메모리·전력·시간·안전 제약 하드웨어 엔지니어링	kits/
PROVE	🏆 AI Olympics	모든 트랙에서 경쟁·벤치마크	Coming 2026

각 경로가 가르치는 내용:

EXPLORE는 왜 — 트레이드오프 이해. 배치 크기·정밀도·모델 구조를 바꾸면 레이턴시·메모리·정확도가 어떻게 변하는지 확인.
BUILD는 어떻게 — 내부 구조 이해. autograd, optimizer, attention을 직접 구현해 TensorFlow·PyTorch가 실제로 어떻게 동작하는지 체험.
DEPLOY는 어디서 — 제약 조건 이해. 실제 메모리 한계·전력 예산·레이턴시 요구사항을 갖는 하드웨어에서 실험.

배울 내용

이 교재는 머신러닝과 시스템 엔지니어링의 교차점을 생각하도록 가르칩니다. 각 장은 알고리즘 개념과 이를 실제로 동작하게 하는 인프라를 연결합니다.

ML ↔ Systems Bridge

ML Concept	Systems Concept	What You Learn
Model parameters	Memory constraints	제한된 자원 디바이스에 큰 모델을 어떻게 맞출지
Inference latency	Hardware acceleration	GPU, TPU, 가속기가 신경망을 어떻게 실행하는지
Training convergence	Compute efficiency	혼합 정밀도·최적화 기법으로 비용을 어떻게 줄이는지
Model accuracy	Quantization and pruning	성능을 유지하면서 모델을 압축하는 방법
Data requirements	Pipeline infrastructure	효율적인 데이터 로딩·전처리 파이프라인 구축 방법
Model deployment	MLOps practices	프로덕션에서 모델을 모니터링·버전 관리·업데이트하는 방법
Privacy constraints	On‑device learning	데이터를 클라우드에 보내지 않고 학습·적응하는 방법

책 구조

Part	Focus	Chapters
I. Foundations	핵심 개념	Introduction, ML Systems, DL Primer, Architectures
II. Design	빌딩 블록	Workflow, Data Engineering, Frameworks, Training
III. Performance	빠르게 만들기	Efficient AI, Optimizations, HW Acceleration, Benchmarking
IV. Deployment	실제 적용	MLOps, On‑device Learning, Privacy, Robustness
V. Trust	올바르게 만들기	Responsible AI, Sustainable AI, AI for Good
VI. Frontiers	다음 단계	Emerging trends and future directions

차별점

이 책은 살아있는 교재입니다. 분야가 성장함에 따라 지속적으로 업데이트하고, 커뮤니티 입력을 반영합니다.

AI는 번개처럼 빠르게 변하지만, 이를 작동하게 하는 엔지니어링 블록은 헤드라인만큼 빠르게 변하지 않습니다. 이 프로젝트는 그 안정적인 기반 위에 세워졌습니다.

레고를 떠올려 보세요. 새로운 세트가 계속 나오지만, 블록 자체는 변하지 않죠. 블록 맞추는 법을 배우면 무엇이든 만들 수 있습니다. 여기서 "AI 블록"은 AI가 작동하도록 하는 견고한 시스템 원칙입니다.

읽기, 실험, 피드백을 통해 다음 학습자를 위한 접근성을 높이는 데 함께해 주세요.

Research to Teaching Loop

연구와 교육을 같은 루프로 사용합니다: 시스템 문제 정의 → 레퍼런스 구현 구축 → 벤치마크 → 커리큘럼·툴링으로 전환 → 다른 사람들이 재현·확장 가능하게.

Loop Step	Research Artifacts	Teaching Artifacts
Measure	Benchmarks, suites, metrics	Benchmarking chapter, assignments
Build	Reference systems, compilers, runtimes	TinyTorch modules, co‑labs
Deploy	Hardware targets, constraints, reliability	Hardware labs, kits

이 작업을 지원해 주세요

우리는 2030년까지 1백만 명의 학습자를 목표로 합니다. AI 엔지니어링을 고립된 관행이 아닌 공유 가능한 학문으로 만들기 위해요. 별, 공유, 기여는 모두 이 움직임을 촉진합니다.

왜 GitHub Stars가 중요한가?

측정된 것이 개선됩니다.

각 스타는 AI 시스템을 엄격하고 실제 제약을 고려해 엔지니어링해야 한다고 믿는 학습자·교육자·지원자입니다.

1 학습자 → 10 학습자 → 100 학습자 → 1,000 학습자 → 10,000 학습자 → 100,000 학습자 → 1M 학습자

Stars는 목표가 아니라 신호입니다.

가시적인 커뮤니티는 대학·재단·산업 파트너가 이 자료를 채택·하드웨어를 기부·워크숍을 지원하기 쉽게 만들고, 그 결과는 차세대 교실·코호트·학습자를 위한 장벽을 낮춥니다.

지원금은 Open Collective로 흐르고, TinyML4D 워크숍·소외된 교실용 하드웨어 키트·무료·오픈 리소스 유지에 쓰입니다.

한 번 클릭으로 다음 교실·다음 기여자·다음 AI 엔지니어 세대를 열 수 있습니다.

사명을 위한 기부

All contributions go to Open Collective, a transparent fund that supports educational outreach.

커뮤니티와 리소스

Resource	Description
📖 텍스트북	인터랙티브 온라인 교재
🔥 TinyTorch	ML 프레임워크를 처음부터 구현
🔧 Hardware Kits	Arduino, Raspberry Pi, 엣지 디바이스에 배포
🌐 Ecosystem	리소스·워크숍·커뮤니티
💬 Discussions	질문·아이디어

기여하기

우리는 교재·TinyTorch·하드웨어 키트에 대한 기여를 환영합니다!

I want to…	Go here
오타 수정·챕터 개선	book/docs/CONTRIBUTING.md
TinyTorch 모듈 추가·버그 수정	tinytorch/CONTRIBUTING.md
하드웨어 실험 개선	kits/README.md
이슈 보고	GitHub Issues
질문하기	GitHub Discussions

인용 및 라이선스

인용

@inproceedings{reddi2024mlsysbook,
  title        = {MLSysBook.AI: Principles and Practices of Machine Learning Systems Engineering},
  author       = {Reddi, Vijay Janapa},
  booktitle    = {2024 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS)},
  pages        = {41--42},
  year         = {2024},
  organization = {IEEE},
  url          = {https://mlsysbook.org}
}

라이선스

이 프로젝트는 이중 라이선스 구조를 사용합니다:

Component	License	What It Means
Book content	CC BY‑NC‑ND 4.0	출처 표시·비상업·변경 금지 조건으로 자유 배포
TinyTorch code	Apache 2.0	자유 사용·수정·배포·특허 보호 포함

텍스트북 내용(챕터·그림·설명)은 교육 자료이며, 출처 표시와 비상업적 사용을 전제로 자유롭게 공유됩니다. 소프트웨어 프레임워크는 누구나 사용·수정·통합할 수 있도록 설계된 도구입니다.

기여자들

다음 훌륭한 분들이 이 리소스를 더 나은 것으로 만들기 위해 기여해 주셨습니다:

✉️ 구독하기 • 💬 토론 참여 • 🌐 mlsysbook.ai 방문

MLSysBook 커뮤니티의 헌신으로 만들어졌습니다.

20 KiB Raw Permalink Blame History Unescape Escape