mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 01:28:35 -05:00
[GH-ISSUE #1393] proposal: reduce clone size and fix contributor onboarding gaps #4401
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Shashank-Tripathi-07 on GitHub (Apr 18, 2026).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/1393
Background
I've been contributing to this repo for a few weeks now and noticed two friction points that compound each other: the repo is slow to clone, and once you have it, it's not obvious where to start.
This issue proposes concrete, non-destructive fixes for both.
Problem 1: Clone size (2 GB .git)
A fresh clone transfers roughly 2 GB. The top offenders in git history:
assets/downloads/Machine-Learning-Systems.epubassets/downloads/Machine-Learning-Systems.pdfinterviews/vault/corpus.jsontools/scripts/socratiQ/bundle.jsThese are binary or generated files. Versioning them in git means every contributor and every CI run pays the full cost on every clone.
Impact: slow CI checkout, slow onboarding, frustration for first-time contributors on slower connections.
Problem 2: No contributor map at the repo root
The repo has three distinct worlds inside it: the TinyTorch framework, the marimo labs, and the Quarto book content. Each has different tooling, different contribution patterns, and different gotchas.
tinytorch/CONTRIBUTING.mdexists and is detailed, but a new contributor landing on the repo root has no idea:titois the CLI they needtito dev exportbefore they show up in the packageThe result: contributors either give up or submit PRs that break CI in ways they don't understand.
Proposed solution
Part 1: Git LFS for large binaries
Migrate
assets/downloads/*.pdf,assets/downloads/*.epubto Git LFS via.gitattributes. This is non-destructive: existing forks stay intact, history is not rewritten, and LFS pointers replace the blobs going forward. CI just needsgit lfs pulladded where the files are actually needed.For
corpus.jsonandbundle.js: add to.gitignoreand generate them in CI. Neither file should be hand-edited, so there is no reason to track them.Expected outcome: fresh clone drops from ~2 GB to under 200 MB.
Part 2: Root-level CONTRIBUTING.md
A single file at the repo root that gives contributors a map:
This file does not replace
tinytorch/CONTRIBUTING.md. It sits one level above it and routes people to the right place.What I can do
I can implement both parts: the LFS migration with updated CI steps, and the root CONTRIBUTING.md. Both are ready to go as separate PRs whenever you want them.
I have been contributing to this repo over the past few weeks across TinyTorch, the labs, and the test suite. I would love to take on a maintainer role for this repo if you are open to it. Happy to discuss what that looks like.
@Shashank-Tripathi-07 commented on GitHub (Apr 18, 2026):
@profvjreddi , I require your help on this as this is a repo-wide change. I want to join as the maintainer/collaborator as it will allow me to contribute in systems rather than just PRs. I want to level up with my work for this project. Kindly consider this 😄