# ============================================================================= # .gitattributes — going-forward Git LFS tracking and text/binary handling # ============================================================================= # # IMPORTANT: this file affects ONLY future `git add` operations. Existing # blobs in history are NOT migrated by these patterns. A separate, coordinated # `git lfs migrate import` (Phase 2) is required to actually relocate the # ~2 GB of binaries already in `.git`. See PR #(this PR) and issues #1393, # #1175 for the migration plan. # # `.gitignore` takes precedence over LFS tracking — if a file is ignored # (e.g., `.gitignore` exempts callout-icon PDFs from the global *.pdf rule), # `.gitattributes` LFS tracking will only apply if the file is actually being # staged. # ----------------------------------------------------------------------------- # Distribution / publish artefacts (large, infrequently changing, binary) # ----------------------------------------------------------------------------- # EPUB: zero currently tracked in HEAD; ~952 MB across 15 historical versions # in `assets/downloads/Machine-Learning-Systems.epub`. Mark for LFS so any # future re-add does not bloat .git. *.epub filter=lfs diff=lfs merge=lfs -text # PDF: covers TinyTorch-Guide.pdf, 00_tinytorch.pdf, distribution PDFs. # Note: `.gitignore` excludes most PDFs by default but explicitly allows # callout-icon PDFs, mlsysim docs, paper figures, etc. Those exempted PDFs # WILL be LFS-tracked under this pattern when newly added — that's the # intended behaviour: small icon PDFs are still small as LFS pointers, and # they are infrequently changed. *.pdf filter=lfs diff=lfs merge=lfs -text # ----------------------------------------------------------------------------- # Audio / video (always binary, never deltas well) # ----------------------------------------------------------------------------- # Two MP3 podcasts are currently tracked (~16 MB combined). Multiple sites # (book quarto, socratiQ, kits) may add more in future. *.mp3 filter=lfs diff=lfs merge=lfs -text *.wav filter=lfs diff=lfs merge=lfs -text *.m4a filter=lfs diff=lfs merge=lfs -text *.mp4 filter=lfs diff=lfs merge=lfs -text *.mov filter=lfs diff=lfs merge=lfs -text *.webm filter=lfs diff=lfs merge=lfs -text # ----------------------------------------------------------------------------- # Bundled JS / WASM artefacts (when ever tracked) # ----------------------------------------------------------------------------- # These are typically build outputs and SHOULD be ignored via .gitignore # rather than tracked (see `book/quarto/tools/scripts/socratiQ/bundle.js`, # the historical `scripts/ai_menu/dist/bundle.js`, and the Next.js # `staffml/_next/static/chunks/*.js` blobs). However, if a bundle ever does # need to be tracked (e.g., a vendored externally-published artefact), # treat it as binary so we don't burn diff cycles. *.wasm filter=lfs diff=lfs merge=lfs -text # ----------------------------------------------------------------------------- # NOT added to LFS (uncertainty / mixed-size patterns) — defer to VJ # ----------------------------------------------------------------------------- # *.png — repo mixes 1-5 MB cover art / kit photos with thousands of small # icon PNGs. A blanket pattern would LFS-track the small ones too. # Recommend either path-scoped patterns # (e.g. `book/quarto/assets/images/covers/**/*.png filter=lfs ...`) # or rasterizing big PNGs to a single canonical location first. # *.jpg / *.jpeg / *.gif — same mixed-size issue. The single biggest GIF is # `book/quarto/contents/vol1/introduction/images/gif/_alphafold.gif` # at 3 MB; most others are small. # *.json — `corpus.json`, `corpus-summary.json`, `search.json` are big but # they are build artefacts and already in `.gitignore`. JSON in # general should NOT be LFS-tracked (it's text and diffs well). # ----------------------------------------------------------------------------- # Text-handling normalization # ----------------------------------------------------------------------------- # Tell git to auto-normalize line endings on text files. Binary patterns # above already opt out via `-text`. * text=auto eol=lf # Shell scripts and Makefiles must keep LF on Windows checkouts. *.sh text eol=lf Makefile text eol=lf # Avoid CRLF translation for Windows-native batch files. *.bat text eol=crlf *.cmd text eol=crlf