Tag syncing fails silently due to buffer overflow #13435

Open
opened 2025-11-02 10:42:14 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @matera-bs on GitHub (Aug 28, 2024).

Description

During the mirroring of a some legacy mirrors hosted in bitbucket. I came across an issue in the process that syncs the git repository tags to the database.

When the contents (i.e. the commit message) of a tag is very large (this particular repository has some tag whose commit message is larger than 100Kb), the internal buffer of the Scanner used to parser the 'git for-each-ref' output overflows. Sadly, there is no message on the log that give a clue as to what is happening. I'm not particular experienced on golang, but it seem that the struct returned by function NewParser (parse.go:30) always returns a nil error no matter what happens.

Gitea Version

1.22.1

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.46.0

Operating System

Ubuntu

How are you running Gitea?

I ran into the issue in production (docker inside k8s). Nonetheless I was able to reproduce the issue inside visual studio code.

Database

PostgreSQL

Originally created by @matera-bs on GitHub (Aug 28, 2024). ### Description During the mirroring of a some legacy mirrors hosted in bitbucket. I came across an issue in the process that syncs the git repository tags to the database. When the contents (i.e. the commit message) of a tag is very large (this particular repository has some tag whose commit message is larger than 100Kb), the internal buffer of the Scanner used to parser the 'git for-each-ref' output overflows. Sadly, there is no message on the log that give a clue as to what is happening. I'm not particular experienced on golang, but it seem that the struct returned by function NewParser (parse.go:30) always returns a nil error no matter what happens. ### Gitea Version 1.22.1 ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version 2.46.0 ### Operating System Ubuntu ### How are you running Gitea? I ran into the issue in production (docker inside k8s). Nonetheless I was able to reproduce the issue inside visual studio code. ### Database PostgreSQL
GiteaMirror added the type/bug label 2025-11-02 10:42:14 -06:00
Author
Owner

@lunny commented on GitHub (Sep 17, 2024):

I think maybe because the release table has the Note column only 16K varchars.

@lunny commented on GitHub (Sep 17, 2024): I think maybe because the `release` table has the `Note` column only 16K varchars.
Author
Owner

@bsofiato commented on GitHub (Sep 18, 2024):

I think maybe because the release table has the Note column only 16K varchars.

Not quite @lunny :(

At least on PostgreSQL, the type of the Note is text (see screenshots attached, the first one shows the xorm mapping of the release entity whereas the second one shows the generated database schema)

image

image

I was able to process the offending tags by adding the following code to the parser.go file. However, I it feels like it only sweeps the real problem under the rug (if there is a tag whose message's lenght is greater than 1Mb it will fail regardless). Moreover, it would increase the memory footprint when syncing the tags :(

image

P.S. According to the docs, we could create a default smaller buffer and allow it to grow until a certain size. If you guys think it is worthwhile I can create a PR to allow the buffer to grow to a larger size.

P.S. @matera-bs is my work account, this is why I answered this particular issue

@bsofiato commented on GitHub (Sep 18, 2024): > I think maybe because the `release` table has the `Note` column only 16K varchars. Not quite @lunny :( At least on PostgreSQL, the type of the `Note` is `text` (see screenshots attached, the first one shows the xorm mapping of the release entity whereas the second one shows the generated database schema) ![image](https://github.com/user-attachments/assets/b2b53a1c-b0b4-4268-8b20-59a201a2e723) ![image](https://github.com/user-attachments/assets/46cc3479-a9c9-4908-962b-cbea2193e903) I was able to process the offending tags by adding the following code to the [parser.go](https://github.com/go-gitea/gitea/blob/main/modules/git/foreachref/parser.go) file. However, I it feels like it only sweeps the real problem under the rug (if there is a tag whose message's lenght is greater than 1Mb it will fail regardless). Moreover, it would increase the memory footprint when syncing the tags :( ![image](https://github.com/user-attachments/assets/620cea1b-d7c2-43ef-9caf-e1fed91bedca) P.S. According to the [docs](https://pkg.go.dev/bufio#Scanner.Buffer), we could create a default smaller buffer and allow it to grow until a certain size. If you guys think it is worthwhile I can create a PR to allow the buffer to grow to a larger size. P.S. @matera-bs is my work account, this is why I answered this particular issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#13435