[GH-ISSUE #363] Data-centric AI Chapter #1387

Closed
opened 2026-04-11 07:46:36 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @profvjreddi on GitHub (Aug 19, 2024).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/363

Currently, the book focuses very much on pure machine learning systems engineering and really neglects the role of data in how we engineer the machine learning systems and this feels like a topic that would be good for the advanced section. While working on this, the data engineering chapter might need some improvement.

Originally created by @profvjreddi on GitHub (Aug 19, 2024). Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/363 Currently, the book focuses very much on pure machine learning systems engineering and really neglects the role of data in how we engineer the machine learning systems and this feels like a topic that would be good for the advanced section. While working on this, the data engineering chapter might need some improvement.
GiteaMirror added the area: booktype: improvementtype: new labels 2026-04-11 07:46:37 -05:00
Author
Owner

@profvjreddi commented on GitHub (Aug 23, 2024):

Some good food for thought around data from the EPOCHS folks regarding dataset scaling size

CleanShot 2024-08-23 at 07 49 37@2x

<!-- gh-comment-id:2306930427 --> @profvjreddi commented on GitHub (Aug 23, 2024): Some good food for thought around data from the EPOCHS folks regarding dataset scaling size ![CleanShot 2024-08-23 at 07 49 37@2x](https://github.com/user-attachments/assets/d037637d-5ec3-4f1f-9d56-b9e99d9f02a3)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#1387