archive endpoint should provide commit object #8181

Open
opened 2025-11-02 07:56:12 -06:00 by GiteaMirror · 7 comments
Owner

Originally created by @milahu on GitHub (Nov 27, 2021).

Feature Description

why

make easier to verify archive downloads by their commit hash

clients prefer the archive endpoint for better performance
but currently, it offers no way to validate the download by commit hash

git is a content-addressable filesystem
and versions are usually declared by their commit hash, not by their tree hash

reconstructing the commit object from the api endpoint is surprisingly painful (see my verify_github_api.py)
at least in the github commit api, the timezones are missing (committer time + author time)
so in the worst case, we must do 24*24 guesses to workaround github's lossy commit api
(i simply assume that the gitea commit api has the same bug ...)

also the api endpoint (usually) is rate-limited, where the archive endpoint is not

by including the commit object as http header, the client can get all data in one request

actual

curl --head https://gitea.com/gitea/awesome-gitea/archive/master.tar.gz
HTTP/1.1 200 OK
Set-Cookie: i_like_gitea=116b71d872546531; Path=/; HttpOnly; SameSite=Lax
Set-Cookie: _csrf=b1XrXHUJgvWdJZJexxFBQFFPWS46MTYzNzk5NDc4NTU5Mjk3OTAwOA; Path=/; Expires=Sun, 28 Nov 2021 06:33:05 GMT; HttpOnly; SameSite=Lax
X-Frame-Options: SAMEORIGIN
Date: Sat, 27 Nov 2021 06:33:05 GMT

expected

note the --header 'x-commit-object: 1' request header
and the X-Commit-Object: ... response header

curl --head --header 'x-commit-object: 1' https://gitea.com/gitea/awesome-gitea/archive/master.tar.gz
HTTP/1.1 200 OK
Set-Cookie: i_like_gitea=ed0c8886a6fb8c04; Path=/; HttpOnly; SameSite=Lax
Set-Cookie: _csrf=Hjlcjj5HJe1nH4Bwj4vnNoujUCs6MTYzNzk5NDkzNzY4MTgzMTA2OA; Path=/; Expires=Sun, 28 Nov 2021 06:35:37 GMT; HttpOnly; SameSite=Lax
X-Frame-Options: SAMEORIGIN
Date: Sat, 27 Nov 2021 06:35:37 GMT
X-Commit-Object: dHJlZSA4OGZiYTU0YTZlZjg0....Bub3JlcGx5LmdpdGVhLmlvPgo=

the X-Commit-Object response header is generated by

git clone https://gitea.com/gitea/awesome-gitea.git
cd awesome-gitea
git cat-file -p 4024c4771ff042cfb7971eaae8e3b9117945f491 | base64 -w0
dHJlZSA4OGZiYTU0YTZlZjg0....Bub3JlcGx5LmdpdGVhLmlvPgo=

edit: json -> base64, to ensure lossless encoding of binary data

verify the archive

when we have only the archive, we can compute the tree hash
in this case 88fba54a6ef84f7af70837885f0ab9db4ac6b073

to compute the commit hash, we also need the exact commit object

related

Originally created by @milahu on GitHub (Nov 27, 2021). ### Feature Description **why** make easier to verify archive downloads by their commit hash clients prefer the archive endpoint for better performance but currently, it offers no way to validate the download by commit hash git is a content-addressable filesystem and versions are usually declared by their commit hash, not by their tree hash reconstructing the commit object from the api endpoint is surprisingly painful (see my [verify_github_api.py](https://github.com/milahu/python-fuse-githubfs/blob/bf782670d136ff8a763b147ba615ff55f5f63b8a/githubfs/verify_github_api.py#L119)) at least in the github commit api, the timezones are missing (committer time + author time) so in the worst case, we must do 24*24 guesses to workaround github's lossy commit api (i simply assume that the gitea commit api has the same bug ...) also the api endpoint (usually) is rate-limited, where the archive endpoint is not by including the commit object as http header, the client can get all data in one request **actual** ``` curl --head https://gitea.com/gitea/awesome-gitea/archive/master.tar.gz HTTP/1.1 200 OK Set-Cookie: i_like_gitea=116b71d872546531; Path=/; HttpOnly; SameSite=Lax Set-Cookie: _csrf=b1XrXHUJgvWdJZJexxFBQFFPWS46MTYzNzk5NDc4NTU5Mjk3OTAwOA; Path=/; Expires=Sun, 28 Nov 2021 06:33:05 GMT; HttpOnly; SameSite=Lax X-Frame-Options: SAMEORIGIN Date: Sat, 27 Nov 2021 06:33:05 GMT ``` **expected** note the `--header 'x-commit-object: 1'` request header and the `X-Commit-Object: ...` response header ``` curl --head --header 'x-commit-object: 1' https://gitea.com/gitea/awesome-gitea/archive/master.tar.gz HTTP/1.1 200 OK Set-Cookie: i_like_gitea=ed0c8886a6fb8c04; Path=/; HttpOnly; SameSite=Lax Set-Cookie: _csrf=Hjlcjj5HJe1nH4Bwj4vnNoujUCs6MTYzNzk5NDkzNzY4MTgzMTA2OA; Path=/; Expires=Sun, 28 Nov 2021 06:35:37 GMT; HttpOnly; SameSite=Lax X-Frame-Options: SAMEORIGIN Date: Sat, 27 Nov 2021 06:35:37 GMT X-Commit-Object: dHJlZSA4OGZiYTU0YTZlZjg0....Bub3JlcGx5LmdpdGVhLmlvPgo= ``` the `X-Commit-Object` response header is generated by ``` git clone https://gitea.com/gitea/awesome-gitea.git cd awesome-gitea git cat-file -p 4024c4771ff042cfb7971eaae8e3b9117945f491 | base64 -w0 dHJlZSA4OGZiYTU0YTZlZjg0....Bub3JlcGx5LmdpdGVhLmlvPgo= ``` edit: json -> base64, to ensure lossless encoding of binary data **verify the archive** when we have only the archive, we can compute the tree hash in this case 88fba54a6ef84f7af70837885f0ab9db4ac6b073 to compute the commit hash, we also need the exact commit object **related** * https://github.com/presslabs/gitfs/issues/378 * [Nix sha256 is bug not feature. solution: a global /cas filesystem](https://discourse.nixos.org/t/nix-sha256-is-bug-not-feature-solution-a-global-cas-filesystem/15791)
GiteaMirror added the type/proposal label 2025-11-02 07:56:12 -06:00
Author
Owner

@lunny commented on GitHub (Dec 6, 2021):

@milahu Could you send a PR to resolve that?

@lunny commented on GitHub (Dec 6, 2021): @milahu Could you send a PR to resolve that?
Author
Owner

@milahu commented on GitHub (Dec 6, 2021):

here is a start https://github.com/milahu/gitea/tree/send-commit-object-in-http-response-header

but im stuck ...

how do i get from gitea/routers/web/repo/repo.go

func addCommitObjectResponseHeader(ctx *context.Context, archiver *repo_model.RepoArchiver) {
	if ctx.Req.Header.Get("X-Commit-Object") != "1" {
		return
	}
	repo := ctx.Repo.Repository
	// FIXME type *models.Repository has no field or method GetCommitObject
	bytes, err := repo.GetCommitObject(archiver.CommitID)
	if err != nil {
		// TODO can we ignore this? CommitID should be valid here
		ctx.Resp.Header().Set("X-Commit-Object", "")
		ctx.Resp.Header().Set("X-Commit-Object-Error", err)
		return
	}
	str := base64.StdEncoding.EncodeToString(bytes)
	ctx.Resp.Header().Set("X-Commit-Object", str)
}

to gitea/modules/git/repo_commit.go

func (repo *Repository) GetCommitObject(id SHA1) ([]byte, error) {
	// we need a bit-exact copy of the commit object for hash calculation
	// -p = print
	stdout, err := NewCommandContext(repo.Ctx, "cat-file", "-p", id.String()).RunInDirBytes(repo.Path)
	if err != nil {
		return nil, err
	}
	return stdout, nil
}

i guess i dont need archiver since all data is in ctx?

dcdb4873c8/models/repo/archiver.go (L29-L36)

@milahu commented on GitHub (Dec 6, 2021): here is a start https://github.com/milahu/gitea/tree/send-commit-object-in-http-response-header but im stuck ... <details> how do i get from [gitea/routers/web/repo/repo.go](https://github.com/milahu/gitea/blob/send-commit-object-in-http-response-header/routers/web/repo/repo.go#L468) ```go func addCommitObjectResponseHeader(ctx *context.Context, archiver *repo_model.RepoArchiver) { if ctx.Req.Header.Get("X-Commit-Object") != "1" { return } repo := ctx.Repo.Repository // FIXME type *models.Repository has no field or method GetCommitObject bytes, err := repo.GetCommitObject(archiver.CommitID) if err != nil { // TODO can we ignore this? CommitID should be valid here ctx.Resp.Header().Set("X-Commit-Object", "") ctx.Resp.Header().Set("X-Commit-Object-Error", err) return } str := base64.StdEncoding.EncodeToString(bytes) ctx.Resp.Header().Set("X-Commit-Object", str) } ``` to [gitea/modules/git/repo_commit.go](https://github.com/milahu/gitea/blob/send-commit-object-in-http-response-header/modules/git/repo_commit.go#L90) ```go func (repo *Repository) GetCommitObject(id SHA1) ([]byte, error) { // we need a bit-exact copy of the commit object for hash calculation // -p = print stdout, err := NewCommandContext(repo.Ctx, "cat-file", "-p", id.String()).RunInDirBytes(repo.Path) if err != nil { return nil, err } return stdout, nil } ``` i guess i dont need `archiver` since all data is in `ctx`? https://github.com/go-gitea/gitea/blob/dcdb4873c8d77a444526fad5b1c8e705fdfe149d/models/repo/archiver.go#L29-L36 </details>
Author
Owner

@lunny commented on GitHub (Dec 6, 2021):

ctx.Repo.GitRepo?

@lunny commented on GitHub (Dec 6, 2021): `ctx.Repo.GitRepo`?
Author
Owner

@milahu commented on GitHub (Dec 6, 2021):

mkay. what would be a good place for a test?

gitea/routers/web/repo/repo_test.go = unit test
gitea/integrations/repo_download_test.go = integration test

in unit test, can i mock a git repo?

@milahu commented on GitHub (Dec 6, 2021): mkay. what would be a good place for a test? gitea/routers/web/repo/repo_test.go = unit test gitea/integrations/repo_download_test.go = integration test in unit test, can i mock a git repo?
Author
Owner

@lunny commented on GitHub (Dec 6, 2021):

I think you can write an integration test so you can visit some git repos. There is no ready git repo on unit test.

@lunny commented on GitHub (Dec 6, 2021): I think you can write an integration test so you can visit some git repos. There is no ready git repo on unit test.
Author
Owner

@zeripath commented on GitHub (Dec 6, 2021):

The commit object can be quite large. Why can't we just send the commit sha instead?

@zeripath commented on GitHub (Dec 6, 2021): The commit object can be quite large. Why can't we just send the commit sha instead?
Author
Owner

@milahu commented on GitHub (Dec 7, 2021):

The commit object can be quite large.

so? bandwidth is cheap, the archive is usually 100x larger

Why can't we just send the commit sha instead?

trust no one : ) i want to verify the archive by its commit hash

@milahu commented on GitHub (Dec 7, 2021): > The commit object can be quite large. so? bandwidth is cheap, the archive is usually 100x larger > Why can't we just send the commit sha instead? trust no one : ) i want to verify the archive by its commit hash
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#8181