---
title: Git LFS API
description: Large File Storage protocol for efficient handling of large files
icon: i-carbon-data-blob
---

# Git LFS API

Git LFS (Large File Storage) protocol for handling large files efficiently with direct S3 uploads/downloads.

---

## Overview

**When to use LFS:**
- Files ≥ LFS threshold (configurable per repo, default 5MB)
- Files matching LFS suffix rules (`.safetensors`, `.bin`, `.gguf`, etc.)
- 32 server-wide default suffixes always use LFS

**Benefits:**
- Direct S3 uploads (no server proxy)
- Content deduplication (same file = same storage)
- Multipart uploads for files >100MB
- Parallel part uploads for faster transfers

---

## Batch API

### Upload/Download Batch Request

**Pattern:** `POST /{repo_type}s/{namespace}/{name}.git/info/lfs/objects/batch`

**Alternative:** `POST /{namespace}/{name}.git/info/lfs/objects/batch`

**Authentication:**
- Optional for `download` operation
- Required for `upload` operation

**Request Body:**
```json
{
  "operation": "upload",
  "transfers": ["basic"],
  "objects": [
    {
      "oid": "abc123def456...",
      "size": 536870912
    }
  ],
  "hash_algo": "sha256",
  "is_browser": false
}
```

**Fields:**
- `operation`: `"upload"` or `"download"`
- `transfers`: Array of transfer types (only `"basic"` supported)
- `objects`: Array of file objects with OID (SHA256) and size
- `hash_algo`: Hash algorithm (default: `"sha256"`)
- `is_browser`: Set to `true` for browser uploads (includes Content-Type in presigned URL)

---

### Response Format

#### Single-Part Upload (< 100MB)

**Response:**
```json
{
  "transfer": "basic",
  "hash_algo": "sha256",
  "objects": [
    {
      "oid": "abc123def456...",
      "size": 52428800,
      "authenticated": true,
      "actions": {
        "upload": {
          "href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...",
          "expires_at": "2025-01-20T12:00:00Z"
        },
        "verify": {
          "href": "/api/namespace/repo.git/info/lfs/verify",
          "expires_at": "2025-01-20T12:00:00Z"
        }
      }
    }
  ]
}
```

**Upload Process:**
1. `PUT` to `actions.upload.href` with file content
2. `POST` to `actions.verify.href` to confirm upload

---

#### Multipart Upload (≥ 100MB)

**Response:**
```json
{
  "transfer": "basic",
  "hash_algo": "sha256",
  "objects": [
    {
      "oid": "abc123def456...",
      "size": 524288000,
      "authenticated": true,
      "actions": {
        "upload": {
          "href": "unused_for_multipart",
          "expires_at": "2025-01-20T12:00:00Z",
          "header": {
            "chunk_size": "52428800",
            "upload_id": "s3_upload_id_xxx",
            "1": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=1&...",
            "2": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=2&...",
            "3": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=3&...",
            "...": "..."
          }
        },
        "verify": {
          "href": "/api/namespace/repo.git/info/lfs/verify",
          "expires_at": "2025-01-20T12:00:00Z"
        }
      }
    }
  ]
}
```

**Multipart Upload Process:**
1. Split file into chunks (size from `chunk_size` header)
2. `PUT` each chunk to `header.{part_number}` URL in parallel
3. Collect ETags from each part upload
4. `POST` to `/lfs/complete` endpoint with ETags
5. `POST` to `actions.verify.href` to confirm

**Chunk Size:**
- Default: 50MB (configurable via `KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES`)
- Minimum: 5MB (S3 requirement, except last part)
- Maximum parts: 10,000 (S3 limit)

---

#### Download Response

**Response:**
```json
{
  "transfer": "basic",
  "hash_algo": "sha256",
  "objects": [
    {
      "oid": "abc123def456...",
      "size": 536870912,
      "authenticated": true,
      "actions": {
        "download": {
          "href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...",
          "expires_at": "2025-01-20T12:00:00Z"
        }
      }
    }
  ]
}
```

**Download Process:**
1. `GET` from `actions.download.href`
2. File downloaded directly from S3

---

#### Existing File (Deduplication)

**Response:**
```json
{
  "transfer": "basic",
  "hash_algo": "sha256",
  "objects": [
    {
      "oid": "abc123def456...",
      "size": 536870912,
      "authenticated": true
    }
  ]
}
```

**No `actions` field = file already exists, skip upload**

---

#### Not Found Error

**Response:**
```json
{
  "transfer": "basic",
  "hash_algo": "sha256",
  "objects": [
    {
      "oid": "abc123def456...",
      "size": 536870912,
      "authenticated": true,
      "error": {
        "code": 404,
        "message": "Object not found in storage"
      }
    }
  ]
}
```

---

## Multipart Complete

### Complete Multipart Upload

**Pattern:** `POST /api/{namespace}/{name}.git/info/lfs/complete/{upload_id}`

**Alternative:** `POST /api/{namespace}/{name}.git/info/lfs/complete`

**Authentication:** Public (no auth check)

**Purpose:** Signal S3 to assemble uploaded parts into final object

**Request Body:**
```json
{
  "oid": "abc123def456...",
  "size": 524288000,
  "upload_id": "s3_upload_id_xxx",
  "parts": [
    {
      "PartNumber": 1,
      "ETag": "etag_from_part_1_upload"
    },
    {
      "partNumber": 2,
      "etag": "etag_from_part_2_upload"
    },
    {
      "PartNumber": 3,
      "ETag": "etag_from_part_3_upload"
    }
  ]
}
```

**Field Notes:**
- `PartNumber` or `partNumber` (case-insensitive)
- `ETag` or `etag` (case-insensitive)
- ETags obtained from part upload responses

**Response:**
```json
{
  "success": true,
  "message": "Multipart upload completed",
  "size": 524288000,
  "etag": "final_s3_etag"
}
```

**Status Codes:**
- `200 OK` - Success
- `400 Bad Request` - Missing fields, size mismatch, or invalid parts
- `500 Internal Server Error` - S3 completion failed

---

## Verify

### Verify Upload

**Pattern:** `POST /api/{namespace}/{name}.git/info/lfs/verify`

**Authentication:** Public (no auth check)

**Purpose:** Verify file was uploaded correctly and exists in storage

**Request Body (Single-Part):**
```json
{
  "oid": "abc123def456...",
  "size": 52428800
}
```

**Request Body (Multipart):**
```json
{
  "oid": "abc123def456...",
  "size": 524288000,
  "upload_id": "s3_upload_id_xxx",
  "parts": [
    {"PartNumber": 1, "ETag": "etag1"},
    {"PartNumber": 2, "ETag": "etag2"}
  ]
}
```

**Verification Steps:**
1. Check file exists in S3 at `lfs/{oid[:2]}/{oid[2:4]}/{oid}`
2. Verify size matches
3. For multipart: Complete upload if not already done

**Response:**
```json
{
  "success": true,
  "message": "Object verified",
  "oid": "abc123def456...",
  "size": 52428800
}
```

**Status Codes:**
- `200 OK` - Verified successfully
- `400 Bad Request` - Size mismatch
- `404 Not Found` - Object not found in storage
- `500 Internal Server Error` - Verification failed

---

## LFS Threshold & Rules

### Repository-Specific Settings

Files use LFS if they meet **either** condition:
1. **Size:** `file_size ≥ lfs_threshold_bytes`
2. **Suffix:** File extension matches `lfs_suffix_rules`

**Configuration Levels:**
- **Server default:** `KOHAKU_HUB_LFS_THRESHOLD_BYTES=5000000` (5MB)
- **Repository override:** Per-repo custom threshold and suffix rules
- **Server suffix defaults:** 32 built-in suffixes always use LFS

**Server Default Suffixes (Always LFS):**
- ML Models: `.safetensors`, `.bin`, `.pt`, `.pth`, `.ckpt`, `.onnx`, `.pb`, `.h5`, `.tflite`, `.gguf`, `.ggml`, `.msgpack`
- Archives: `.zip`, `.tar`, `.gz`, `.bz2`, `.xz`, `.7z`, `.rar`
- Data: `.npy`, `.npz`, `.arrow`, `.parquet`
- Media: `.mp4`, `.avi`, `.mkv`, `.mov`, `.wav`, `.mp3`, `.flac`
- Images: `.tiff`, `.tif`

**Example:**
- `model.safetensors` (100KB) → Uses LFS (suffix rule)
- `config.json` (1KB) → Regular (< threshold, no suffix match)
- `data.bin` (10MB) → Uses LFS (suffix rule + size)
- `large_file.txt` (20MB) → Uses LFS (size only)

### Get Repository LFS Settings

**Pattern:** `GET /api/{repo_type}s/{namespace}/{name}/settings/lfs`

**Authentication:** Required (repo owner or admin)

**Response:**
```json
{
  "lfs_threshold_bytes": 10000000,
  "lfs_keep_versions": 10,
  "lfs_suffix_rules": [".safetensors", ".custom"],
  "lfs_threshold_bytes_effective": 10000000,
  "lfs_threshold_bytes_source": "repository",
  "lfs_keep_versions_effective": 10,
  "lfs_keep_versions_source": "repository",
  "lfs_suffix_rules_effective": [".safetensors", ".bin", "...", ".custom"],
  "lfs_suffix_rules_source": "merged",
  "server_defaults": {
    "lfs_threshold_bytes": 5000000,
    "lfs_keep_versions": 5,
    "lfs_suffix_rules_default": [".safetensors", ".bin", "..."]
  }
}
```

### Update Repository LFS Settings

**Pattern:** `PUT /api/{repo_type}s/{namespace}/{name}/settings`

**Request Body:**
```json
{
  "lfs_threshold_bytes": 10000000,
  "lfs_keep_versions": 10,
  "lfs_suffix_rules": [".safetensors", ".custom"]
}
```

**Notes:**
- `null` value = inherit server default
- `lfs_suffix_rules` adds to (not replaces) server defaults
- `lfs_keep_versions` controls garbage collection

---

## Storage & Deduplication

### LFS Object Storage

**S3 Path:** `s3://{bucket}/lfs/{sha256[:2]}/{sha256[2:4]}/{sha256}`

**Example:**
- OID: `abc123def456...`
- Path: `lfs/ab/c1/abc123def456...`

**Deduplication:**
- Same content = same SHA256 = same S3 object
- Multiple repos can reference same LFS object
- Saves storage space automatically

### Garbage Collection

**When objects are deleted:**
- Files deleted from repository
- File replaced with new version
- Repository deleted
- Based on `lfs_keep_versions` setting

**LFS Keep Versions:**
- Default: 5 versions per file path
- Configurable per repository
- Older versions auto-deleted on new uploads
- Manual GC via admin API

---

## Client Examples

### Upload with huggingface_hub

```python
from huggingface_hub import HfApi

api = HfApi(endpoint="http://localhost:28080")

# Upload large file (auto-detects LFS)
api.upload_file(
    path_or_fileobj="model.safetensors",
    path_in_repo="model.safetensors",
    repo_id="username/my-model",
    repo_type="model",
    token="your_token"
)
```

### Manual LFS Upload (Multipart)

```python
import requests
import hashlib

# 1. Calculate SHA256
with open("large_file.bin", "rb") as f:
    sha256 = hashlib.sha256(f.read()).hexdigest()
    file_size = f.tell()

# 2. Request batch
batch_req = {
    "operation": "upload",
    "transfers": ["basic"],
    "objects": [{"oid": sha256, "size": file_size}],
    "hash_algo": "sha256"
}

batch_resp = requests.post(
    "http://localhost:28080/username/repo.git/info/lfs/objects/batch",
    json=batch_req,
    headers={"Authorization": "Bearer your_token"}
).json()

obj = batch_resp["objects"][0]

# 3. Check if multipart
if "chunk_size" in obj["actions"]["upload"].get("header", {}):
    # Multipart upload
    header = obj["actions"]["upload"]["header"]
    chunk_size = int(header["chunk_size"])
    upload_id = header["upload_id"]

    # Upload parts
    parts = []
    with open("large_file.bin", "rb") as f:
        part_num = 1
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break

            # Upload part
            part_url = header[str(part_num)]
            resp = requests.put(part_url, data=chunk)
            etag = resp.headers["ETag"].strip('"')
            parts.append({"PartNumber": part_num, "ETag": etag})
            part_num += 1

    # Complete multipart
    complete_resp = requests.post(
        f"http://localhost:28080/api/username/repo.git/info/lfs/complete/{upload_id}",
        json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts}
    )

    # Verify
    verify_resp = requests.post(
        obj["actions"]["verify"]["href"],
        json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts}
    )
else:
    # Single-part upload
    with open("large_file.bin", "rb") as f:
        requests.put(obj["actions"]["upload"]["href"], data=f)

    # Verify
    requests.post(
        obj["actions"]["verify"]["href"],
        json={"oid": sha256, "size": file_size}
    )
```

---

## Error Handling

**413 Payload Too Large:**
```json
{
  "error": "Storage quota exceeded",
  "message": "You have used 9.5 GB of your 10 GB quota"
}
```

**404 Object Not Found:**
```json
{
  "objects": [
    {
      "oid": "abc123...",
      "size": 12345,
      "error": {
        "code": 404,
        "message": "Object not found in storage"
      }
    }
  ]
}
```

**400 Invalid Request:**
```json
{
  "error": "Size mismatch: expected 524288000, got 524287999"
}
```

---

## Performance Tips

**For uploaders:**
- Use multipart for files >100MB
- Upload parts in parallel (up to 10 concurrent)
- Increase chunk size for faster uploads (max 100MB)
- Retry failed parts (don't restart entire upload)

**For downloaders:**
- Use HTTP range requests for partial downloads
- Resume interrupted downloads
- Parallel downloads for multiple files
- Cache downloaded LFS objects locally

**Configuration:**
```bash
# Increase multipart threshold (default 100MB)
KOHAKU_HUB_LFS_MULTIPART_THRESHOLD_BYTES=209715200  # 200MB

# Increase chunk size (default 50MB)
KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES=104857600  # 100MB
```

---

## Next Steps

- [Git Protocol API](./git-protocol.md) - Git Smart HTTP
- [File Upload API](./file-upload.md) - Direct commits
- [Admin API](./admin.md) - LFS management