[GH-ISSUE #646] Chapter 6 - dataset size #4150

Closed
opened 2026-04-19 12:09:32 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @profvjreddi on GitHub (Jan 23, 2025).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/646

Originally assigned to: @mmaz, @18jeffreyma on GitHub.

@18jeffreyma @mmaz I think it will be interesting to include how dataset sizes have been growing in the data engineering chapter.

Could you look through https://epoch.ai/blog/trends-in-training-dataset-sizes and see what plot might make sense?

Something at the beginning would be good for students to learn just how big data is getting in big ML systems. And then, of course, we should say that edge ML and mobile ML data is even later, but just not as structured and captured.

Originally created by @profvjreddi on GitHub (Jan 23, 2025). Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/646 Originally assigned to: @mmaz, @18jeffreyma on GitHub. @18jeffreyma @mmaz I think it will be interesting to include how dataset sizes have been growing in the data engineering chapter. Could you look through https://epoch.ai/blog/trends-in-training-dataset-sizes and see what plot might make sense? Something at the beginning would be good for students to learn just how big data is getting in big ML systems. And then, of course, we should say that edge ML and mobile ML data is even later, but just not as structured and captured.
GiteaMirror added the area: booktype: improvement labels 2026-04-19 12:09:32 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#4150