--- title: Git LFS API description: Large File Storage protocol for efficient handling of large files icon: i-carbon-data-blob --- # Git LFS API Git LFS (Large File Storage) protocol for handling large files efficiently with direct S3 uploads/downloads. --- ## Overview **When to use LFS:** - Files ≥ LFS threshold (configurable per repo, default 5MB) - Files matching LFS suffix rules (`.safetensors`, `.bin`, `.gguf`, etc.) - 32 server-wide default suffixes always use LFS **Benefits:** - Direct S3 uploads (no server proxy) - Content deduplication (same file = same storage) - Multipart uploads for files >100MB - Parallel part uploads for faster transfers --- ## Batch API ### Upload/Download Batch Request **Pattern:** `POST /{repo_type}s/{namespace}/{name}.git/info/lfs/objects/batch` **Alternative:** `POST /{namespace}/{name}.git/info/lfs/objects/batch` **Authentication:** - Optional for `download` operation - Required for `upload` operation **Request Body:** ```json { "operation": "upload", "transfers": ["basic"], "objects": [ { "oid": "abc123def456...", "size": 536870912 } ], "hash_algo": "sha256", "is_browser": false } ``` **Fields:** - `operation`: `"upload"` or `"download"` - `transfers`: Array of transfer types (only `"basic"` supported) - `objects`: Array of file objects with OID (SHA256) and size - `hash_algo`: Hash algorithm (default: `"sha256"`) - `is_browser`: Set to `true` for browser uploads (includes Content-Type in presigned URL) --- ### Response Format #### Single-Part Upload (< 100MB) **Response:** ```json { "transfer": "basic", "hash_algo": "sha256", "objects": [ { "oid": "abc123def456...", "size": 52428800, "authenticated": true, "actions": { "upload": { "href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...", "expires_at": "2025-01-20T12:00:00Z" }, "verify": { "href": "/api/namespace/repo.git/info/lfs/verify", "expires_at": "2025-01-20T12:00:00Z" } } } ] } ``` **Upload Process:** 1. `PUT` to `actions.upload.href` with file content 2. `POST` to `actions.verify.href` to confirm upload --- #### Multipart Upload (≥ 100MB) **Response:** ```json { "transfer": "basic", "hash_algo": "sha256", "objects": [ { "oid": "abc123def456...", "size": 524288000, "authenticated": true, "actions": { "upload": { "href": "unused_for_multipart", "expires_at": "2025-01-20T12:00:00Z", "header": { "chunk_size": "52428800", "upload_id": "s3_upload_id_xxx", "1": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=1&...", "2": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=2&...", "3": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=3&...", "...": "..." } }, "verify": { "href": "/api/namespace/repo.git/info/lfs/verify", "expires_at": "2025-01-20T12:00:00Z" } } } ] } ``` **Multipart Upload Process:** 1. Split file into chunks (size from `chunk_size` header) 2. `PUT` each chunk to `header.{part_number}` URL in parallel 3. Collect ETags from each part upload 4. `POST` to `/lfs/complete` endpoint with ETags 5. `POST` to `actions.verify.href` to confirm **Chunk Size:** - Default: 50MB (configurable via `KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES`) - Minimum: 5MB (S3 requirement, except last part) - Maximum parts: 10,000 (S3 limit) --- #### Download Response **Response:** ```json { "transfer": "basic", "hash_algo": "sha256", "objects": [ { "oid": "abc123def456...", "size": 536870912, "authenticated": true, "actions": { "download": { "href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...", "expires_at": "2025-01-20T12:00:00Z" } } } ] } ``` **Download Process:** 1. `GET` from `actions.download.href` 2. File downloaded directly from S3 --- #### Existing File (Deduplication) **Response:** ```json { "transfer": "basic", "hash_algo": "sha256", "objects": [ { "oid": "abc123def456...", "size": 536870912, "authenticated": true } ] } ``` **No `actions` field = file already exists, skip upload** --- #### Not Found Error **Response:** ```json { "transfer": "basic", "hash_algo": "sha256", "objects": [ { "oid": "abc123def456...", "size": 536870912, "authenticated": true, "error": { "code": 404, "message": "Object not found in storage" } } ] } ``` --- ## Multipart Complete ### Complete Multipart Upload **Pattern:** `POST /api/{namespace}/{name}.git/info/lfs/complete/{upload_id}` **Alternative:** `POST /api/{namespace}/{name}.git/info/lfs/complete` **Authentication:** Public (no auth check) **Purpose:** Signal S3 to assemble uploaded parts into final object **Request Body:** ```json { "oid": "abc123def456...", "size": 524288000, "upload_id": "s3_upload_id_xxx", "parts": [ { "PartNumber": 1, "ETag": "etag_from_part_1_upload" }, { "partNumber": 2, "etag": "etag_from_part_2_upload" }, { "PartNumber": 3, "ETag": "etag_from_part_3_upload" } ] } ``` **Field Notes:** - `PartNumber` or `partNumber` (case-insensitive) - `ETag` or `etag` (case-insensitive) - ETags obtained from part upload responses **Response:** ```json { "success": true, "message": "Multipart upload completed", "size": 524288000, "etag": "final_s3_etag" } ``` **Status Codes:** - `200 OK` - Success - `400 Bad Request` - Missing fields, size mismatch, or invalid parts - `500 Internal Server Error` - S3 completion failed --- ## Verify ### Verify Upload **Pattern:** `POST /api/{namespace}/{name}.git/info/lfs/verify` **Authentication:** Public (no auth check) **Purpose:** Verify file was uploaded correctly and exists in storage **Request Body (Single-Part):** ```json { "oid": "abc123def456...", "size": 52428800 } ``` **Request Body (Multipart):** ```json { "oid": "abc123def456...", "size": 524288000, "upload_id": "s3_upload_id_xxx", "parts": [ {"PartNumber": 1, "ETag": "etag1"}, {"PartNumber": 2, "ETag": "etag2"} ] } ``` **Verification Steps:** 1. Check file exists in S3 at `lfs/{oid[:2]}/{oid[2:4]}/{oid}` 2. Verify size matches 3. For multipart: Complete upload if not already done **Response:** ```json { "success": true, "message": "Object verified", "oid": "abc123def456...", "size": 52428800 } ``` **Status Codes:** - `200 OK` - Verified successfully - `400 Bad Request` - Size mismatch - `404 Not Found` - Object not found in storage - `500 Internal Server Error` - Verification failed --- ## LFS Threshold & Rules ### Repository-Specific Settings Files use LFS if they meet **either** condition: 1. **Size:** `file_size ≥ lfs_threshold_bytes` 2. **Suffix:** File extension matches `lfs_suffix_rules` **Configuration Levels:** - **Server default:** `KOHAKU_HUB_LFS_THRESHOLD_BYTES=5000000` (5MB) - **Repository override:** Per-repo custom threshold and suffix rules - **Server suffix defaults:** 32 built-in suffixes always use LFS **Server Default Suffixes (Always LFS):** - ML Models: `.safetensors`, `.bin`, `.pt`, `.pth`, `.ckpt`, `.onnx`, `.pb`, `.h5`, `.tflite`, `.gguf`, `.ggml`, `.msgpack` - Archives: `.zip`, `.tar`, `.gz`, `.bz2`, `.xz`, `.7z`, `.rar` - Data: `.npy`, `.npz`, `.arrow`, `.parquet` - Media: `.mp4`, `.avi`, `.mkv`, `.mov`, `.wav`, `.mp3`, `.flac` - Images: `.tiff`, `.tif` **Example:** - `model.safetensors` (100KB) → Uses LFS (suffix rule) - `config.json` (1KB) → Regular (< threshold, no suffix match) - `data.bin` (10MB) → Uses LFS (suffix rule + size) - `large_file.txt` (20MB) → Uses LFS (size only) ### Get Repository LFS Settings **Pattern:** `GET /api/{repo_type}s/{namespace}/{name}/settings/lfs` **Authentication:** Required (repo owner or admin) **Response:** ```json { "lfs_threshold_bytes": 10000000, "lfs_keep_versions": 10, "lfs_suffix_rules": [".safetensors", ".custom"], "lfs_threshold_bytes_effective": 10000000, "lfs_threshold_bytes_source": "repository", "lfs_keep_versions_effective": 10, "lfs_keep_versions_source": "repository", "lfs_suffix_rules_effective": [".safetensors", ".bin", "...", ".custom"], "lfs_suffix_rules_source": "merged", "server_defaults": { "lfs_threshold_bytes": 5000000, "lfs_keep_versions": 5, "lfs_suffix_rules_default": [".safetensors", ".bin", "..."] } } ``` ### Update Repository LFS Settings **Pattern:** `PUT /api/{repo_type}s/{namespace}/{name}/settings` **Request Body:** ```json { "lfs_threshold_bytes": 10000000, "lfs_keep_versions": 10, "lfs_suffix_rules": [".safetensors", ".custom"] } ``` **Notes:** - `null` value = inherit server default - `lfs_suffix_rules` adds to (not replaces) server defaults - `lfs_keep_versions` controls garbage collection --- ## Storage & Deduplication ### LFS Object Storage **S3 Path:** `s3://{bucket}/lfs/{sha256[:2]}/{sha256[2:4]}/{sha256}` **Example:** - OID: `abc123def456...` - Path: `lfs/ab/c1/abc123def456...` **Deduplication:** - Same content = same SHA256 = same S3 object - Multiple repos can reference same LFS object - Saves storage space automatically ### Garbage Collection **When objects are deleted:** - Files deleted from repository - File replaced with new version - Repository deleted - Based on `lfs_keep_versions` setting **LFS Keep Versions:** - Default: 5 versions per file path - Configurable per repository - Older versions auto-deleted on new uploads - Manual GC via admin API --- ## Client Examples ### Upload with huggingface_hub ```python from huggingface_hub import HfApi api = HfApi(endpoint="http://localhost:28080") # Upload large file (auto-detects LFS) api.upload_file( path_or_fileobj="model.safetensors", path_in_repo="model.safetensors", repo_id="username/my-model", repo_type="model", token="your_token" ) ``` ### Manual LFS Upload (Multipart) ```python import requests import hashlib # 1. Calculate SHA256 with open("large_file.bin", "rb") as f: sha256 = hashlib.sha256(f.read()).hexdigest() file_size = f.tell() # 2. Request batch batch_req = { "operation": "upload", "transfers": ["basic"], "objects": [{"oid": sha256, "size": file_size}], "hash_algo": "sha256" } batch_resp = requests.post( "http://localhost:28080/username/repo.git/info/lfs/objects/batch", json=batch_req, headers={"Authorization": "Bearer your_token"} ).json() obj = batch_resp["objects"][0] # 3. Check if multipart if "chunk_size" in obj["actions"]["upload"].get("header", {}): # Multipart upload header = obj["actions"]["upload"]["header"] chunk_size = int(header["chunk_size"]) upload_id = header["upload_id"] # Upload parts parts = [] with open("large_file.bin", "rb") as f: part_num = 1 while True: chunk = f.read(chunk_size) if not chunk: break # Upload part part_url = header[str(part_num)] resp = requests.put(part_url, data=chunk) etag = resp.headers["ETag"].strip('"') parts.append({"PartNumber": part_num, "ETag": etag}) part_num += 1 # Complete multipart complete_resp = requests.post( f"http://localhost:28080/api/username/repo.git/info/lfs/complete/{upload_id}", json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts} ) # Verify verify_resp = requests.post( obj["actions"]["verify"]["href"], json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts} ) else: # Single-part upload with open("large_file.bin", "rb") as f: requests.put(obj["actions"]["upload"]["href"], data=f) # Verify requests.post( obj["actions"]["verify"]["href"], json={"oid": sha256, "size": file_size} ) ``` --- ## Error Handling **413 Payload Too Large:** ```json { "error": "Storage quota exceeded", "message": "You have used 9.5 GB of your 10 GB quota" } ``` **404 Object Not Found:** ```json { "objects": [ { "oid": "abc123...", "size": 12345, "error": { "code": 404, "message": "Object not found in storage" } } ] } ``` **400 Invalid Request:** ```json { "error": "Size mismatch: expected 524288000, got 524287999" } ``` --- ## Performance Tips **For uploaders:** - Use multipart for files >100MB - Upload parts in parallel (up to 10 concurrent) - Increase chunk size for faster uploads (max 100MB) - Retry failed parts (don't restart entire upload) **For downloaders:** - Use HTTP range requests for partial downloads - Resume interrupted downloads - Parallel downloads for multiple files - Cache downloaded LFS objects locally **Configuration:** ```bash # Increase multipart threshold (default 100MB) KOHAKU_HUB_LFS_MULTIPART_THRESHOLD_BYTES=209715200 # 200MB # Increase chunk size (default 50MB) KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES=104857600 # 100MB ``` --- ## Next Steps - [Git Protocol API](./git-protocol.md) - Git Smart HTTP - [File Upload API](./file-upload.md) - Direct commits - [Admin API](./admin.md) - LFS management