Files
cs249r_book/.gitattributes
Vijay Janapa Reddi 4968397bbc chore(repo): add .gitattributes with going-forward Git LFS tracking
Track *.epub, *.pdf, *.mp3/wav/m4a/mp4/mov/webm, *.wasm via Git LFS so
future additions stop compounding the clone-size problem. Existing
history is NOT migrated by this change — see issues #1393 and #1175 for
the planned Phase 2 (`git lfs migrate import`, force-push, team re-clone)
which requires VJ approval and coordinated rollout.

Mixed-size patterns (*.png, *.jpg, *.gif) are deliberately omitted: the
repo has thousands of small icon PNGs alongside 1-5 MB cover art / kit
photos and a blanket pattern would LFS-track the small ones too. Leaving
those for VJ to scope by path.

Relates to #1393, #1175.
2026-04-30 18:42:55 -04:00

84 lines
4.4 KiB
Plaintext

# =============================================================================
# .gitattributes — going-forward Git LFS tracking and text/binary handling
# =============================================================================
#
# IMPORTANT: this file affects ONLY future `git add` operations. Existing
# blobs in history are NOT migrated by these patterns. A separate, coordinated
# `git lfs migrate import` (Phase 2) is required to actually relocate the
# ~2 GB of binaries already in `.git`. See PR #(this PR) and issues #1393,
# #1175 for the migration plan.
#
# `.gitignore` takes precedence over LFS tracking — if a file is ignored
# (e.g., `.gitignore` exempts callout-icon PDFs from the global *.pdf rule),
# `.gitattributes` LFS tracking will only apply if the file is actually being
# staged.
# -----------------------------------------------------------------------------
# Distribution / publish artefacts (large, infrequently changing, binary)
# -----------------------------------------------------------------------------
# EPUB: zero currently tracked in HEAD; ~952 MB across 15 historical versions
# in `assets/downloads/Machine-Learning-Systems.epub`. Mark for LFS so any
# future re-add does not bloat .git.
*.epub filter=lfs diff=lfs merge=lfs -text
# PDF: covers TinyTorch-Guide.pdf, 00_tinytorch.pdf, distribution PDFs.
# Note: `.gitignore` excludes most PDFs by default but explicitly allows
# callout-icon PDFs, mlsysim docs, paper figures, etc. Those exempted PDFs
# WILL be LFS-tracked under this pattern when newly added — that's the
# intended behaviour: small icon PDFs are still small as LFS pointers, and
# they are infrequently changed.
*.pdf filter=lfs diff=lfs merge=lfs -text
# -----------------------------------------------------------------------------
# Audio / video (always binary, never deltas well)
# -----------------------------------------------------------------------------
# Two MP3 podcasts are currently tracked (~16 MB combined). Multiple sites
# (book quarto, socratiQ, kits) may add more in future.
*.mp3 filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text
*.m4a filter=lfs diff=lfs merge=lfs -text
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text
*.webm filter=lfs diff=lfs merge=lfs -text
# -----------------------------------------------------------------------------
# Bundled JS / WASM artefacts (when ever tracked)
# -----------------------------------------------------------------------------
# These are typically build outputs and SHOULD be ignored via .gitignore
# rather than tracked (see `book/quarto/tools/scripts/socratiQ/bundle.js`,
# the historical `scripts/ai_menu/dist/bundle.js`, and the Next.js
# `staffml/_next/static/chunks/*.js` blobs). However, if a bundle ever does
# need to be tracked (e.g., a vendored externally-published artefact),
# treat it as binary so we don't burn diff cycles.
*.wasm filter=lfs diff=lfs merge=lfs -text
# -----------------------------------------------------------------------------
# NOT added to LFS (uncertainty / mixed-size patterns) — defer to VJ
# -----------------------------------------------------------------------------
# *.png — repo mixes 1-5 MB cover art / kit photos with thousands of small
# icon PNGs. A blanket pattern would LFS-track the small ones too.
# Recommend either path-scoped patterns
# (e.g. `book/quarto/assets/images/covers/**/*.png filter=lfs ...`)
# or rasterizing big PNGs to a single canonical location first.
# *.jpg / *.jpeg / *.gif — same mixed-size issue. The single biggest GIF is
# `book/quarto/contents/vol1/introduction/images/gif/_alphafold.gif`
# at 3 MB; most others are small.
# *.json — `corpus.json`, `corpus-summary.json`, `search.json` are big but
# they are build artefacts and already in `.gitignore`. JSON in
# general should NOT be LFS-tracked (it's text and diffs well).
# -----------------------------------------------------------------------------
# Text-handling normalization
# -----------------------------------------------------------------------------
# Tell git to auto-normalize line endings on text files. Binary patterns
# above already opt out via `-text`.
* text=auto eol=lf
# Shell scripts and Makefiles must keep LF on Windows checkouts.
*.sh text eol=lf
Makefile text eol=lf
# Avoid CRLF translation for Windows-native batch files.
*.bat text eol=crlf
*.cmd text eol=crlf