13 KiB
title, description, icon
| title | description | icon |
|---|---|---|
| Git LFS API | Large File Storage protocol for efficient handling of large files | i-carbon-data-blob |
Git LFS API
Git LFS (Large File Storage) protocol for handling large files efficiently with direct S3 uploads/downloads.
Overview
When to use LFS:
- Files ≥ LFS threshold (configurable per repo, default 5MB)
- Files matching LFS suffix rules (
.safetensors,.bin,.gguf, etc.) - 32 server-wide default suffixes always use LFS
Benefits:
- Direct S3 uploads (no server proxy)
- Content deduplication (same file = same storage)
- Multipart uploads for files >100MB
- Parallel part uploads for faster transfers
Batch API
Upload/Download Batch Request
Pattern: POST /{repo_type}s/{namespace}/{name}.git/info/lfs/objects/batch
Alternative: POST /{namespace}/{name}.git/info/lfs/objects/batch
Authentication:
- Optional for
downloadoperation - Required for
uploadoperation
Request Body:
{
"operation": "upload",
"transfers": ["basic"],
"objects": [
{
"oid": "abc123def456...",
"size": 536870912
}
],
"hash_algo": "sha256",
"is_browser": false
}
Fields:
operation:"upload"or"download"transfers: Array of transfer types (only"basic"supported)objects: Array of file objects with OID (SHA256) and sizehash_algo: Hash algorithm (default:"sha256")is_browser: Set totruefor browser uploads (includes Content-Type in presigned URL)
Response Format
Single-Part Upload (< 100MB)
Response:
{
"transfer": "basic",
"hash_algo": "sha256",
"objects": [
{
"oid": "abc123def456...",
"size": 52428800,
"authenticated": true,
"actions": {
"upload": {
"href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...",
"expires_at": "2025-01-20T12:00:00Z"
},
"verify": {
"href": "/api/namespace/repo.git/info/lfs/verify",
"expires_at": "2025-01-20T12:00:00Z"
}
}
}
]
}
Upload Process:
PUTtoactions.upload.hrefwith file contentPOSTtoactions.verify.hrefto confirm upload
Multipart Upload (≥ 100MB)
Response:
{
"transfer": "basic",
"hash_algo": "sha256",
"objects": [
{
"oid": "abc123def456...",
"size": 524288000,
"authenticated": true,
"actions": {
"upload": {
"href": "unused_for_multipart",
"expires_at": "2025-01-20T12:00:00Z",
"header": {
"chunk_size": "52428800",
"upload_id": "s3_upload_id_xxx",
"1": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=1&...",
"2": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=2&...",
"3": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=3&...",
"...": "..."
}
},
"verify": {
"href": "/api/namespace/repo.git/info/lfs/verify",
"expires_at": "2025-01-20T12:00:00Z"
}
}
}
]
}
Multipart Upload Process:
- Split file into chunks (size from
chunk_sizeheader) PUTeach chunk toheader.{part_number}URL in parallel- Collect ETags from each part upload
POSTto/lfs/completeendpoint with ETagsPOSTtoactions.verify.hrefto confirm
Chunk Size:
- Default: 50MB (configurable via
KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES) - Minimum: 5MB (S3 requirement, except last part)
- Maximum parts: 10,000 (S3 limit)
Download Response
Response:
{
"transfer": "basic",
"hash_algo": "sha256",
"objects": [
{
"oid": "abc123def456...",
"size": 536870912,
"authenticated": true,
"actions": {
"download": {
"href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...",
"expires_at": "2025-01-20T12:00:00Z"
}
}
}
]
}
Download Process:
GETfromactions.download.href- File downloaded directly from S3
Existing File (Deduplication)
Response:
{
"transfer": "basic",
"hash_algo": "sha256",
"objects": [
{
"oid": "abc123def456...",
"size": 536870912,
"authenticated": true
}
]
}
No actions field = file already exists, skip upload
Not Found Error
Response:
{
"transfer": "basic",
"hash_algo": "sha256",
"objects": [
{
"oid": "abc123def456...",
"size": 536870912,
"authenticated": true,
"error": {
"code": 404,
"message": "Object not found in storage"
}
}
]
}
Multipart Complete
Complete Multipart Upload
Pattern: POST /api/{namespace}/{name}.git/info/lfs/complete/{upload_id}
Alternative: POST /api/{namespace}/{name}.git/info/lfs/complete
Authentication: Public (no auth check)
Purpose: Signal S3 to assemble uploaded parts into final object
Request Body:
{
"oid": "abc123def456...",
"size": 524288000,
"upload_id": "s3_upload_id_xxx",
"parts": [
{
"PartNumber": 1,
"ETag": "etag_from_part_1_upload"
},
{
"partNumber": 2,
"etag": "etag_from_part_2_upload"
},
{
"PartNumber": 3,
"ETag": "etag_from_part_3_upload"
}
]
}
Field Notes:
PartNumberorpartNumber(case-insensitive)ETagoretag(case-insensitive)- ETags obtained from part upload responses
Response:
{
"success": true,
"message": "Multipart upload completed",
"size": 524288000,
"etag": "final_s3_etag"
}
Status Codes:
200 OK- Success400 Bad Request- Missing fields, size mismatch, or invalid parts500 Internal Server Error- S3 completion failed
Verify
Verify Upload
Pattern: POST /api/{namespace}/{name}.git/info/lfs/verify
Authentication: Public (no auth check)
Purpose: Verify file was uploaded correctly and exists in storage
Request Body (Single-Part):
{
"oid": "abc123def456...",
"size": 52428800
}
Request Body (Multipart):
{
"oid": "abc123def456...",
"size": 524288000,
"upload_id": "s3_upload_id_xxx",
"parts": [
{"PartNumber": 1, "ETag": "etag1"},
{"PartNumber": 2, "ETag": "etag2"}
]
}
Verification Steps:
- Check file exists in S3 at
lfs/{oid[:2]}/{oid[2:4]}/{oid} - Verify size matches
- For multipart: Complete upload if not already done
Response:
{
"success": true,
"message": "Object verified",
"oid": "abc123def456...",
"size": 52428800
}
Status Codes:
200 OK- Verified successfully400 Bad Request- Size mismatch404 Not Found- Object not found in storage500 Internal Server Error- Verification failed
LFS Threshold & Rules
Repository-Specific Settings
Files use LFS if they meet either condition:
- Size:
file_size ≥ lfs_threshold_bytes - Suffix: File extension matches
lfs_suffix_rules
Configuration Levels:
- Server default:
KOHAKU_HUB_LFS_THRESHOLD_BYTES=5000000(5MB) - Repository override: Per-repo custom threshold and suffix rules
- Server suffix defaults: 32 built-in suffixes always use LFS
Server Default Suffixes (Always LFS):
- ML Models:
.safetensors,.bin,.pt,.pth,.ckpt,.onnx,.pb,.h5,.tflite,.gguf,.ggml,.msgpack - Archives:
.zip,.tar,.gz,.bz2,.xz,.7z,.rar - Data:
.npy,.npz,.arrow,.parquet - Media:
.mp4,.avi,.mkv,.mov,.wav,.mp3,.flac - Images:
.tiff,.tif
Example:
model.safetensors(100KB) → Uses LFS (suffix rule)config.json(1KB) → Regular (< threshold, no suffix match)data.bin(10MB) → Uses LFS (suffix rule + size)large_file.txt(20MB) → Uses LFS (size only)
Get Repository LFS Settings
Pattern: GET /api/{repo_type}s/{namespace}/{name}/settings/lfs
Authentication: Required (repo owner or admin)
Response:
{
"lfs_threshold_bytes": 10000000,
"lfs_keep_versions": 10,
"lfs_suffix_rules": [".safetensors", ".custom"],
"lfs_threshold_bytes_effective": 10000000,
"lfs_threshold_bytes_source": "repository",
"lfs_keep_versions_effective": 10,
"lfs_keep_versions_source": "repository",
"lfs_suffix_rules_effective": [".safetensors", ".bin", "...", ".custom"],
"lfs_suffix_rules_source": "merged",
"server_defaults": {
"lfs_threshold_bytes": 5000000,
"lfs_keep_versions": 5,
"lfs_suffix_rules_default": [".safetensors", ".bin", "..."]
}
}
Update Repository LFS Settings
Pattern: PUT /api/{repo_type}s/{namespace}/{name}/settings
Request Body:
{
"lfs_threshold_bytes": 10000000,
"lfs_keep_versions": 10,
"lfs_suffix_rules": [".safetensors", ".custom"]
}
Notes:
nullvalue = inherit server defaultlfs_suffix_rulesadds to (not replaces) server defaultslfs_keep_versionscontrols garbage collection
Storage & Deduplication
LFS Object Storage
S3 Path: s3://{bucket}/lfs/{sha256[:2]}/{sha256[2:4]}/{sha256}
Example:
- OID:
abc123def456... - Path:
lfs/ab/c1/abc123def456...
Deduplication:
- Same content = same SHA256 = same S3 object
- Multiple repos can reference same LFS object
- Saves storage space automatically
Garbage Collection
When objects are deleted:
- Files deleted from repository
- File replaced with new version
- Repository deleted
- Based on
lfs_keep_versionssetting
LFS Keep Versions:
- Default: 5 versions per file path
- Configurable per repository
- Older versions auto-deleted on new uploads
- Manual GC via admin API
Client Examples
Upload with huggingface_hub
from huggingface_hub import HfApi
api = HfApi(endpoint="http://localhost:28080")
# Upload large file (auto-detects LFS)
api.upload_file(
path_or_fileobj="model.safetensors",
path_in_repo="model.safetensors",
repo_id="username/my-model",
repo_type="model",
token="your_token"
)
Manual LFS Upload (Multipart)
import requests
import hashlib
# 1. Calculate SHA256
with open("large_file.bin", "rb") as f:
sha256 = hashlib.sha256(f.read()).hexdigest()
file_size = f.tell()
# 2. Request batch
batch_req = {
"operation": "upload",
"transfers": ["basic"],
"objects": [{"oid": sha256, "size": file_size}],
"hash_algo": "sha256"
}
batch_resp = requests.post(
"http://localhost:28080/username/repo.git/info/lfs/objects/batch",
json=batch_req,
headers={"Authorization": "Bearer your_token"}
).json()
obj = batch_resp["objects"][0]
# 3. Check if multipart
if "chunk_size" in obj["actions"]["upload"].get("header", {}):
# Multipart upload
header = obj["actions"]["upload"]["header"]
chunk_size = int(header["chunk_size"])
upload_id = header["upload_id"]
# Upload parts
parts = []
with open("large_file.bin", "rb") as f:
part_num = 1
while True:
chunk = f.read(chunk_size)
if not chunk:
break
# Upload part
part_url = header[str(part_num)]
resp = requests.put(part_url, data=chunk)
etag = resp.headers["ETag"].strip('"')
parts.append({"PartNumber": part_num, "ETag": etag})
part_num += 1
# Complete multipart
complete_resp = requests.post(
f"http://localhost:28080/api/username/repo.git/info/lfs/complete/{upload_id}",
json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts}
)
# Verify
verify_resp = requests.post(
obj["actions"]["verify"]["href"],
json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts}
)
else:
# Single-part upload
with open("large_file.bin", "rb") as f:
requests.put(obj["actions"]["upload"]["href"], data=f)
# Verify
requests.post(
obj["actions"]["verify"]["href"],
json={"oid": sha256, "size": file_size}
)
Error Handling
413 Payload Too Large:
{
"error": "Storage quota exceeded",
"message": "You have used 9.5 GB of your 10 GB quota"
}
404 Object Not Found:
{
"objects": [
{
"oid": "abc123...",
"size": 12345,
"error": {
"code": 404,
"message": "Object not found in storage"
}
}
]
}
400 Invalid Request:
{
"error": "Size mismatch: expected 524288000, got 524287999"
}
Performance Tips
For uploaders:
- Use multipart for files >100MB
- Upload parts in parallel (up to 10 concurrent)
- Increase chunk size for faster uploads (max 100MB)
- Retry failed parts (don't restart entire upload)
For downloaders:
- Use HTTP range requests for partial downloads
- Resume interrupted downloads
- Parallel downloads for multiple files
- Cache downloaded LFS objects locally
Configuration:
# Increase multipart threshold (default 100MB)
KOHAKU_HUB_LFS_MULTIPART_THRESHOLD_BYTES=209715200 # 200MB
# Increase chunk size (default 50MB)
KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES=104857600 # 100MB
Next Steps
- Git Protocol API - Git Smart HTTP
- File Upload API - Direct commits
- Admin API - LFS management