Cannot get file SHA over API when LFS is in use #14632

Closed
opened 2025-11-02 11:18:23 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @hramrach on GitHub (Jun 21, 2025).

Description

The contents API lists SHA of the LFS redirect, not SHA of the file, with no indication that the file is stored in LFS.

{'Access-Control-Expose-Headers': 'Content-Disposition', 'Cache-Control': 'private, max-age=300', 'Content-Disposition': 'inline; filename="config.tar.bz2"; filename*=UTF-8''config.tar.bz2', 'Content-Length': '131', 'Content-Type': 'text/plain; charset=utf-8', 'Etag': '"d948b9b700b0a9248e131b58f6544aebbda4e3c553602a4747330aa7b4391b4b"', 'Last-Modified': 'Fri, 20 Jun 2025 18:38:24 GMT', 'No-Gzip-Compression': '1', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-Gitea-Object-Type': 'file', 'Date': 'Sat, 21 Jun 2025 11:08:03 GMT'}
version https://git-lfs.github.com/spec/v1
oid sha256:9db9f10cacfc0121f762bba4dbe6572a51a09924245ec358a1b66abd9fb68888
size 306922

[{'name': 'config.tar.bz2', 'path': 'config.tar.bz2', 'sha': 'd948b9b700b0a9248e131b58f6544aebbda4e3c553602a4747330aa7b4391b4b', 'last_commit_sha': 'cecfa1db758fa52082558a5b0c05541a63bf46549441a625666822d6291c54ba', 'type': 'file', 'size': 131, 'encoding': None, 'content': None, 'target': None, 'url': 'https://src.opensuse.org/api/v1/repos/michals/kernel-source/contents/config.tar.bz2?ref=home%2Fmichals%2Fkernel-git', 'html_url': 'https://src.opensuse.org/michals/kernel-source/src/branch/home/michals/kernel-git/config.tar.bz2', 'git_url': 'd948b9b700', 'download_url': 'https://src.opensuse.org/michals/kernel-source/raw/branch/home/michals/kernel-git/config.tar.bz2', 'submodule_git_url': None, '_links': {'self': 'https://src.opensuse.org/api/v1/repos/michals/kernel-source/contents/config.tar.bz2?ref=home%2Fmichals%2Fkernel-git', 'git': 'd948b9b700', 'html': 'https://src.opensuse.org/michals/kernel-source/src/branch/home/michals/kernel-git/config.tar.bz2'}}]

The file can be retrieved through the media API so gitea knows that the file is stored in LFS:

{'Accept-Ranges': 'bytes', 'Access-Control-Expose-Headers': 'Content-Disposition', 'Cache-Control': 'private, max-age=300', 'Content-Disposition': 'inline; filename="config.tar.bz2"; filename*=UTF-8''config.tar.bz2', 'Content-Length': '306922', 'Content-Type': 'application/octet-stream', 'Etag': '"9db9f10cacfc0121f762bba4dbe6572a51a09924245ec358a1b66abd9fb68888"', 'Last-Modified': 'Fri, 20 Jun 2025 18:38:24 GMT', 'No-Gzip-Compression': '1', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-Gitea-Object-Type': 'file', 'Date': 'Sat, 21 Jun 2025 11:08:02 GMT'}

but then the whole file is sent. It does not seem feasible to get the actual file SHA without retrieving the whole file. The content API is already very time-consuming but does not provide the file SHA. HEAD on the media URL does not provide the SHA header.

https://demo.gitea.com/hramrach/kernel-source/src/branch/home/hramrach/kernel-git
https://src.opensuse.org/michals/kernel-source/src/branch/home/michals/kernel-git/

Gitea Version

1.23.8

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

unknown

Operating System

Linux

How are you running Gitea?

All of the above

Database

None

Originally created by @hramrach on GitHub (Jun 21, 2025). ### Description The contents API lists SHA of the LFS redirect, not SHA of the file, with no indication that the file is stored in LFS. {'Access-Control-Expose-Headers': 'Content-Disposition', 'Cache-Control': 'private, max-age=300', 'Content-Disposition': 'inline; filename="config.tar.bz2"; filename*=UTF-8\'\'config.tar.bz2', 'Content-Length': '131', 'Content-Type': 'text/plain; charset=utf-8', 'Etag': '"d948b9b700b0a9248e131b58f6544aebbda4e3c553602a4747330aa7b4391b4b"', 'Last-Modified': 'Fri, 20 Jun 2025 18:38:24 GMT', 'No-Gzip-Compression': '1', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-Gitea-Object-Type': 'file', 'Date': 'Sat, 21 Jun 2025 11:08:03 GMT'} version https://git-lfs.github.com/spec/v1 oid sha256:9db9f10cacfc0121f762bba4dbe6572a51a09924245ec358a1b66abd9fb68888 size 306922 [{'name': 'config.tar.bz2', 'path': 'config.tar.bz2', 'sha': 'd948b9b700b0a9248e131b58f6544aebbda4e3c553602a4747330aa7b4391b4b', 'last_commit_sha': 'cecfa1db758fa52082558a5b0c05541a63bf46549441a625666822d6291c54ba', 'type': 'file', 'size': 131, 'encoding': None, 'content': None, 'target': None, 'url': 'https://src.opensuse.org/api/v1/repos/michals/kernel-source/contents/config.tar.bz2?ref=home%2Fmichals%2Fkernel-git', 'html_url': 'https://src.opensuse.org/michals/kernel-source/src/branch/home/michals/kernel-git/config.tar.bz2', 'git_url': 'https://src.opensuse.org/api/v1/repos/michals/kernel-source/git/blobs/d948b9b700b0a9248e131b58f6544aebbda4e3c553602a4747330aa7b4391b4b', 'download_url': 'https://src.opensuse.org/michals/kernel-source/raw/branch/home/michals/kernel-git/config.tar.bz2', 'submodule_git_url': None, '_links': {'self': 'https://src.opensuse.org/api/v1/repos/michals/kernel-source/contents/config.tar.bz2?ref=home%2Fmichals%2Fkernel-git', 'git': 'https://src.opensuse.org/api/v1/repos/michals/kernel-source/git/blobs/d948b9b700b0a9248e131b58f6544aebbda4e3c553602a4747330aa7b4391b4b', 'html': 'https://src.opensuse.org/michals/kernel-source/src/branch/home/michals/kernel-git/config.tar.bz2'}}] The file can be retrieved through the media API so gitea knows that the file is stored in LFS: {'Accept-Ranges': 'bytes', 'Access-Control-Expose-Headers': 'Content-Disposition', 'Cache-Control': 'private, max-age=300', 'Content-Disposition': 'inline; filename="config.tar.bz2"; filename*=UTF-8\'\'config.tar.bz2', 'Content-Length': '306922', 'Content-Type': 'application/octet-stream', 'Etag': '"9db9f10cacfc0121f762bba4dbe6572a51a09924245ec358a1b66abd9fb68888"', 'Last-Modified': 'Fri, 20 Jun 2025 18:38:24 GMT', 'No-Gzip-Compression': '1', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-Gitea-Object-Type': 'file', 'Date': 'Sat, 21 Jun 2025 11:08:02 GMT'} but then the whole file is sent. It does not seem feasible to get the actual file SHA without retrieving the whole file. The content API is already very time-consuming but does not provide the file SHA. HEAD on the media URL does not provide the SHA header. https://demo.gitea.com/hramrach/kernel-source/src/branch/home/hramrach/kernel-git https://src.opensuse.org/michals/kernel-source/src/branch/home/michals/kernel-git/ ### Gitea Version 1.23.8 ### Can you reproduce the bug on the Gitea demo site? Yes ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version unknown ### Operating System Linux ### How are you running Gitea? All of the above ### Database None
GiteaMirror added the topic/apitype/bug labels 2025-11-02 11:18:23 -06:00
Author
Owner

@wxiaoguang commented on GitHub (Jun 23, 2025):

The "contents" API was written to follow GitHub's behavior, it has many problems.

So I proposed a new "contents-ext" API: Refactor repo contents API and add "contents-ext" API #34822

Then we can do something like /repos/{owner}/{repo}/contents-ext/{filepath}?includes=file_content,lfs_meta,lfs_content, etc.

Could you elaborate the details about you need? For example: the use cases, and what you'd like to see in the API response (with some examples).

@wxiaoguang commented on GitHub (Jun 23, 2025): The "contents" API was written to follow GitHub's behavior, it has many problems. So I proposed a new "contents-ext" API: Refactor repo contents API and add "contents-ext" API #34822 Then we can do something like `/repos/{owner}/{repo}/contents-ext/{filepath}?includes=file_content,lfs_meta,lfs_content`, etc. Could you elaborate the details about you need? For example: the use cases, and what you'd like to see in the API response (with some examples).
Author
Owner

@hramrach commented on GitHub (Jun 23, 2025):

The use case for me is to sync a directory into git repository without downloading the previous content of the repository.

@hramrach commented on GitHub (Jun 23, 2025): The use case for me is to sync a directory into git repository without downloading the previous content of the repository.
Author
Owner

@hramrach commented on GitHub (Jun 23, 2025):

An example of what I would want to see is some flag that says the file is in fact stored as LFS so that I know reliably that I should download the raw content and interpret it as LFS link (which happens to work for sha256 repository).

Even easier to use would be to provide the actual media SHA rather than the raw file SHA.

In general I want to get the SHA of the actual content, as opposed to the SHA of the LFS link which is currently provided.

Given that uploading a file should transparently store it in LFS it is broken that listing content is not transparent to LFS.

@hramrach commented on GitHub (Jun 23, 2025): An example of what I would want to see is some flag that says the file is in fact stored as LFS so that I know reliably that I should download the raw content and interpret it as LFS link (which happens to work for sha256 repository). Even easier to use would be to provide the actual media SHA rather than the raw file SHA. In general I want to get the SHA of the actual content, as opposed to the SHA of the LFS link which is currently provided. Given that uploading a file should transparently store it in LFS it is broken that listing content is not transparent to LFS.
Author
Owner

@wxiaoguang commented on GitHub (Jun 23, 2025):

See Refactor repo contents API and add "contents-ext" API #34822 :

Request /repos/{owner}/{repo}/contents-ext/{filepath}?includes=lfs_metadata, then there will be lfs_oid and lfs_size in response if the file is a valid lfs pointer file.

@wxiaoguang commented on GitHub (Jun 23, 2025): See Refactor repo contents API and add "contents-ext" API #34822 : Request `/repos/{owner}/{repo}/contents-ext/{filepath}?includes=lfs_metadata`, then there will be `lfs_oid` and `lfs_size` in response if the file is a valid lfs pointer file.
Author
Owner

@hramrach commented on GitHub (Jun 23, 2025):

Thanks, that sounds like it will be useful for this use case

@hramrach commented on GitHub (Jun 23, 2025): Thanks, that sounds like it will be useful for this use case
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#14632