mirror of
https://github.com/KohakuBlueleaf/KohakuHub.git
synced 2026-05-24 20:22:55 -05:00
595 lines
13 KiB
Markdown
595 lines
13 KiB
Markdown
---
|
|
title: Git LFS API
|
|
description: Large File Storage protocol for efficient handling of large files
|
|
icon: i-carbon-data-blob
|
|
---
|
|
|
|
# Git LFS API
|
|
|
|
Git LFS (Large File Storage) protocol for handling large files efficiently with direct S3 uploads/downloads.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
**When to use LFS:**
|
|
- Files ≥ LFS threshold (configurable per repo, default 5MB)
|
|
- Files matching LFS suffix rules (`.safetensors`, `.bin`, `.gguf`, etc.)
|
|
- 32 server-wide default suffixes always use LFS
|
|
|
|
**Benefits:**
|
|
- Direct S3 uploads (no server proxy)
|
|
- Content deduplication (same file = same storage)
|
|
- Multipart uploads for files >100MB
|
|
- Parallel part uploads for faster transfers
|
|
|
|
---
|
|
|
|
## Batch API
|
|
|
|
### Upload/Download Batch Request
|
|
|
|
**Pattern:** `POST /{repo_type}s/{namespace}/{name}.git/info/lfs/objects/batch`
|
|
|
|
**Alternative:** `POST /{namespace}/{name}.git/info/lfs/objects/batch`
|
|
|
|
**Authentication:**
|
|
- Optional for `download` operation
|
|
- Required for `upload` operation
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"operation": "upload",
|
|
"transfers": ["basic"],
|
|
"objects": [
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 536870912
|
|
}
|
|
],
|
|
"hash_algo": "sha256",
|
|
"is_browser": false
|
|
}
|
|
```
|
|
|
|
**Fields:**
|
|
- `operation`: `"upload"` or `"download"`
|
|
- `transfers`: Array of transfer types (only `"basic"` supported)
|
|
- `objects`: Array of file objects with OID (SHA256) and size
|
|
- `hash_algo`: Hash algorithm (default: `"sha256"`)
|
|
- `is_browser`: Set to `true` for browser uploads (includes Content-Type in presigned URL)
|
|
|
|
---
|
|
|
|
### Response Format
|
|
|
|
#### Single-Part Upload (< 100MB)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"transfer": "basic",
|
|
"hash_algo": "sha256",
|
|
"objects": [
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 52428800,
|
|
"authenticated": true,
|
|
"actions": {
|
|
"upload": {
|
|
"href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...",
|
|
"expires_at": "2025-01-20T12:00:00Z"
|
|
},
|
|
"verify": {
|
|
"href": "/api/namespace/repo.git/info/lfs/verify",
|
|
"expires_at": "2025-01-20T12:00:00Z"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Upload Process:**
|
|
1. `PUT` to `actions.upload.href` with file content
|
|
2. `POST` to `actions.verify.href` to confirm upload
|
|
|
|
---
|
|
|
|
#### Multipart Upload (≥ 100MB)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"transfer": "basic",
|
|
"hash_algo": "sha256",
|
|
"objects": [
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 524288000,
|
|
"authenticated": true,
|
|
"actions": {
|
|
"upload": {
|
|
"href": "unused_for_multipart",
|
|
"expires_at": "2025-01-20T12:00:00Z",
|
|
"header": {
|
|
"chunk_size": "52428800",
|
|
"upload_id": "s3_upload_id_xxx",
|
|
"1": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=1&...",
|
|
"2": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=2&...",
|
|
"3": "https://s3.amazonaws.com/.../uploadId=xxx&partNumber=3&...",
|
|
"...": "..."
|
|
}
|
|
},
|
|
"verify": {
|
|
"href": "/api/namespace/repo.git/info/lfs/verify",
|
|
"expires_at": "2025-01-20T12:00:00Z"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Multipart Upload Process:**
|
|
1. Split file into chunks (size from `chunk_size` header)
|
|
2. `PUT` each chunk to `header.{part_number}` URL in parallel
|
|
3. Collect ETags from each part upload
|
|
4. `POST` to `/lfs/complete` endpoint with ETags
|
|
5. `POST` to `actions.verify.href` to confirm
|
|
|
|
**Chunk Size:**
|
|
- Default: 50MB (configurable via `KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES`)
|
|
- Minimum: 5MB (S3 requirement, except last part)
|
|
- Maximum parts: 10,000 (S3 limit)
|
|
|
|
---
|
|
|
|
#### Download Response
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"transfer": "basic",
|
|
"hash_algo": "sha256",
|
|
"objects": [
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 536870912,
|
|
"authenticated": true,
|
|
"actions": {
|
|
"download": {
|
|
"href": "https://s3.amazonaws.com/bucket/lfs/ab/c1/abc123...?X-Amz-...",
|
|
"expires_at": "2025-01-20T12:00:00Z"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Download Process:**
|
|
1. `GET` from `actions.download.href`
|
|
2. File downloaded directly from S3
|
|
|
|
---
|
|
|
|
#### Existing File (Deduplication)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"transfer": "basic",
|
|
"hash_algo": "sha256",
|
|
"objects": [
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 536870912,
|
|
"authenticated": true
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**No `actions` field = file already exists, skip upload**
|
|
|
|
---
|
|
|
|
#### Not Found Error
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"transfer": "basic",
|
|
"hash_algo": "sha256",
|
|
"objects": [
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 536870912,
|
|
"authenticated": true,
|
|
"error": {
|
|
"code": 404,
|
|
"message": "Object not found in storage"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Multipart Complete
|
|
|
|
### Complete Multipart Upload
|
|
|
|
**Pattern:** `POST /api/{namespace}/{name}.git/info/lfs/complete/{upload_id}`
|
|
|
|
**Alternative:** `POST /api/{namespace}/{name}.git/info/lfs/complete`
|
|
|
|
**Authentication:** Public (no auth check)
|
|
|
|
**Purpose:** Signal S3 to assemble uploaded parts into final object
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 524288000,
|
|
"upload_id": "s3_upload_id_xxx",
|
|
"parts": [
|
|
{
|
|
"PartNumber": 1,
|
|
"ETag": "etag_from_part_1_upload"
|
|
},
|
|
{
|
|
"partNumber": 2,
|
|
"etag": "etag_from_part_2_upload"
|
|
},
|
|
{
|
|
"PartNumber": 3,
|
|
"ETag": "etag_from_part_3_upload"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Field Notes:**
|
|
- `PartNumber` or `partNumber` (case-insensitive)
|
|
- `ETag` or `etag` (case-insensitive)
|
|
- ETags obtained from part upload responses
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Multipart upload completed",
|
|
"size": 524288000,
|
|
"etag": "final_s3_etag"
|
|
}
|
|
```
|
|
|
|
**Status Codes:**
|
|
- `200 OK` - Success
|
|
- `400 Bad Request` - Missing fields, size mismatch, or invalid parts
|
|
- `500 Internal Server Error` - S3 completion failed
|
|
|
|
---
|
|
|
|
## Verify
|
|
|
|
### Verify Upload
|
|
|
|
**Pattern:** `POST /api/{namespace}/{name}.git/info/lfs/verify`
|
|
|
|
**Authentication:** Public (no auth check)
|
|
|
|
**Purpose:** Verify file was uploaded correctly and exists in storage
|
|
|
|
**Request Body (Single-Part):**
|
|
```json
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 52428800
|
|
}
|
|
```
|
|
|
|
**Request Body (Multipart):**
|
|
```json
|
|
{
|
|
"oid": "abc123def456...",
|
|
"size": 524288000,
|
|
"upload_id": "s3_upload_id_xxx",
|
|
"parts": [
|
|
{"PartNumber": 1, "ETag": "etag1"},
|
|
{"PartNumber": 2, "ETag": "etag2"}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Verification Steps:**
|
|
1. Check file exists in S3 at `lfs/{oid[:2]}/{oid[2:4]}/{oid}`
|
|
2. Verify size matches
|
|
3. For multipart: Complete upload if not already done
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Object verified",
|
|
"oid": "abc123def456...",
|
|
"size": 52428800
|
|
}
|
|
```
|
|
|
|
**Status Codes:**
|
|
- `200 OK` - Verified successfully
|
|
- `400 Bad Request` - Size mismatch
|
|
- `404 Not Found` - Object not found in storage
|
|
- `500 Internal Server Error` - Verification failed
|
|
|
|
---
|
|
|
|
## LFS Threshold & Rules
|
|
|
|
### Repository-Specific Settings
|
|
|
|
Files use LFS if they meet **either** condition:
|
|
1. **Size:** `file_size ≥ lfs_threshold_bytes`
|
|
2. **Suffix:** File extension matches `lfs_suffix_rules`
|
|
|
|
**Configuration Levels:**
|
|
- **Server default:** `KOHAKU_HUB_LFS_THRESHOLD_BYTES=5000000` (5MB)
|
|
- **Repository override:** Per-repo custom threshold and suffix rules
|
|
- **Server suffix defaults:** 32 built-in suffixes always use LFS
|
|
|
|
**Server Default Suffixes (Always LFS):**
|
|
- ML Models: `.safetensors`, `.bin`, `.pt`, `.pth`, `.ckpt`, `.onnx`, `.pb`, `.h5`, `.tflite`, `.gguf`, `.ggml`, `.msgpack`
|
|
- Archives: `.zip`, `.tar`, `.gz`, `.bz2`, `.xz`, `.7z`, `.rar`
|
|
- Data: `.npy`, `.npz`, `.arrow`, `.parquet`
|
|
- Media: `.mp4`, `.avi`, `.mkv`, `.mov`, `.wav`, `.mp3`, `.flac`
|
|
- Images: `.tiff`, `.tif`
|
|
|
|
**Example:**
|
|
- `model.safetensors` (100KB) → Uses LFS (suffix rule)
|
|
- `config.json` (1KB) → Regular (< threshold, no suffix match)
|
|
- `data.bin` (10MB) → Uses LFS (suffix rule + size)
|
|
- `large_file.txt` (20MB) → Uses LFS (size only)
|
|
|
|
### Get Repository LFS Settings
|
|
|
|
**Pattern:** `GET /api/{repo_type}s/{namespace}/{name}/settings/lfs`
|
|
|
|
**Authentication:** Required (repo owner or admin)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"lfs_threshold_bytes": 10000000,
|
|
"lfs_keep_versions": 10,
|
|
"lfs_suffix_rules": [".safetensors", ".custom"],
|
|
"lfs_threshold_bytes_effective": 10000000,
|
|
"lfs_threshold_bytes_source": "repository",
|
|
"lfs_keep_versions_effective": 10,
|
|
"lfs_keep_versions_source": "repository",
|
|
"lfs_suffix_rules_effective": [".safetensors", ".bin", "...", ".custom"],
|
|
"lfs_suffix_rules_source": "merged",
|
|
"server_defaults": {
|
|
"lfs_threshold_bytes": 5000000,
|
|
"lfs_keep_versions": 5,
|
|
"lfs_suffix_rules_default": [".safetensors", ".bin", "..."]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Update Repository LFS Settings
|
|
|
|
**Pattern:** `PUT /api/{repo_type}s/{namespace}/{name}/settings`
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"lfs_threshold_bytes": 10000000,
|
|
"lfs_keep_versions": 10,
|
|
"lfs_suffix_rules": [".safetensors", ".custom"]
|
|
}
|
|
```
|
|
|
|
**Notes:**
|
|
- `null` value = inherit server default
|
|
- `lfs_suffix_rules` adds to (not replaces) server defaults
|
|
- `lfs_keep_versions` controls garbage collection
|
|
|
|
---
|
|
|
|
## Storage & Deduplication
|
|
|
|
### LFS Object Storage
|
|
|
|
**S3 Path:** `s3://{bucket}/lfs/{sha256[:2]}/{sha256[2:4]}/{sha256}`
|
|
|
|
**Example:**
|
|
- OID: `abc123def456...`
|
|
- Path: `lfs/ab/c1/abc123def456...`
|
|
|
|
**Deduplication:**
|
|
- Same content = same SHA256 = same S3 object
|
|
- Multiple repos can reference same LFS object
|
|
- Saves storage space automatically
|
|
|
|
### Garbage Collection
|
|
|
|
**When objects are deleted:**
|
|
- Files deleted from repository
|
|
- File replaced with new version
|
|
- Repository deleted
|
|
- Based on `lfs_keep_versions` setting
|
|
|
|
**LFS Keep Versions:**
|
|
- Default: 5 versions per file path
|
|
- Configurable per repository
|
|
- Older versions auto-deleted on new uploads
|
|
- Manual GC via admin API
|
|
|
|
---
|
|
|
|
## Client Examples
|
|
|
|
### Upload with huggingface_hub
|
|
|
|
```python
|
|
from huggingface_hub import HfApi
|
|
|
|
api = HfApi(endpoint="http://localhost:28080")
|
|
|
|
# Upload large file (auto-detects LFS)
|
|
api.upload_file(
|
|
path_or_fileobj="model.safetensors",
|
|
path_in_repo="model.safetensors",
|
|
repo_id="username/my-model",
|
|
repo_type="model",
|
|
token="your_token"
|
|
)
|
|
```
|
|
|
|
### Manual LFS Upload (Multipart)
|
|
|
|
```python
|
|
import requests
|
|
import hashlib
|
|
|
|
# 1. Calculate SHA256
|
|
with open("large_file.bin", "rb") as f:
|
|
sha256 = hashlib.sha256(f.read()).hexdigest()
|
|
file_size = f.tell()
|
|
|
|
# 2. Request batch
|
|
batch_req = {
|
|
"operation": "upload",
|
|
"transfers": ["basic"],
|
|
"objects": [{"oid": sha256, "size": file_size}],
|
|
"hash_algo": "sha256"
|
|
}
|
|
|
|
batch_resp = requests.post(
|
|
"http://localhost:28080/username/repo.git/info/lfs/objects/batch",
|
|
json=batch_req,
|
|
headers={"Authorization": "Bearer your_token"}
|
|
).json()
|
|
|
|
obj = batch_resp["objects"][0]
|
|
|
|
# 3. Check if multipart
|
|
if "chunk_size" in obj["actions"]["upload"].get("header", {}):
|
|
# Multipart upload
|
|
header = obj["actions"]["upload"]["header"]
|
|
chunk_size = int(header["chunk_size"])
|
|
upload_id = header["upload_id"]
|
|
|
|
# Upload parts
|
|
parts = []
|
|
with open("large_file.bin", "rb") as f:
|
|
part_num = 1
|
|
while True:
|
|
chunk = f.read(chunk_size)
|
|
if not chunk:
|
|
break
|
|
|
|
# Upload part
|
|
part_url = header[str(part_num)]
|
|
resp = requests.put(part_url, data=chunk)
|
|
etag = resp.headers["ETag"].strip('"')
|
|
parts.append({"PartNumber": part_num, "ETag": etag})
|
|
part_num += 1
|
|
|
|
# Complete multipart
|
|
complete_resp = requests.post(
|
|
f"http://localhost:28080/api/username/repo.git/info/lfs/complete/{upload_id}",
|
|
json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts}
|
|
)
|
|
|
|
# Verify
|
|
verify_resp = requests.post(
|
|
obj["actions"]["verify"]["href"],
|
|
json={"oid": sha256, "size": file_size, "upload_id": upload_id, "parts": parts}
|
|
)
|
|
else:
|
|
# Single-part upload
|
|
with open("large_file.bin", "rb") as f:
|
|
requests.put(obj["actions"]["upload"]["href"], data=f)
|
|
|
|
# Verify
|
|
requests.post(
|
|
obj["actions"]["verify"]["href"],
|
|
json={"oid": sha256, "size": file_size}
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
**413 Payload Too Large:**
|
|
```json
|
|
{
|
|
"error": "Storage quota exceeded",
|
|
"message": "You have used 9.5 GB of your 10 GB quota"
|
|
}
|
|
```
|
|
|
|
**404 Object Not Found:**
|
|
```json
|
|
{
|
|
"objects": [
|
|
{
|
|
"oid": "abc123...",
|
|
"size": 12345,
|
|
"error": {
|
|
"code": 404,
|
|
"message": "Object not found in storage"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**400 Invalid Request:**
|
|
```json
|
|
{
|
|
"error": "Size mismatch: expected 524288000, got 524287999"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Tips
|
|
|
|
**For uploaders:**
|
|
- Use multipart for files >100MB
|
|
- Upload parts in parallel (up to 10 concurrent)
|
|
- Increase chunk size for faster uploads (max 100MB)
|
|
- Retry failed parts (don't restart entire upload)
|
|
|
|
**For downloaders:**
|
|
- Use HTTP range requests for partial downloads
|
|
- Resume interrupted downloads
|
|
- Parallel downloads for multiple files
|
|
- Cache downloaded LFS objects locally
|
|
|
|
**Configuration:**
|
|
```bash
|
|
# Increase multipart threshold (default 100MB)
|
|
KOHAKU_HUB_LFS_MULTIPART_THRESHOLD_BYTES=209715200 # 200MB
|
|
|
|
# Increase chunk size (default 50MB)
|
|
KOHAKU_HUB_LFS_MULTIPART_CHUNK_SIZE_BYTES=104857600 # 100MB
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
- [Git Protocol API](./git-protocol.md) - Git Smart HTTP
|
|
- [File Upload API](./file-upload.md) - Direct commits
|
|
- [Admin API](./admin.md) - LFS management
|