Synchronizing repository branches fails #12070

Closed
opened 2025-11-02 09:56:47 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @inbjo on GitHub (Nov 23, 2023).

Description

Upgraded to version 1.21.0 I get an error accessing the repository page.

SyncRepoBranches, Error 1366 (HY000): Incorrect string value: '\xBD\xE2\xBE\xF673...' for column 'commit_message' at row 1

I converted all database tables to utf8mb4 via gitea cli command doctor convent. but the problem remains. I have checked the database and configuration files to make sure they are OK.

2023/11/22 15:05:34 ...repository/branch.go:25:SyncRepoBranches() [D] SyncRepoBranches: in Repo[84:ASM-Server/asm_03_regist]
2023/11/22 15:05:34 ...repository/branch.go:44:SyncRepoBranchesWithRepo() [T] SyncRepoBranches[.....]
2023/11/22 15:05:34 ...ules/context/repo.go:681:RepoAssignment() [E] SyncRepoBranches: Error 1366 (HY000): Incorrect string value: '\xBD\xE2\xBE\xF673...' for column 'commit_message' at row 1

Looking through the logs I found that writing the branch information to the database fails because the commit message contains garbled code. I downloaded the repository to check the information of the last commit message, and found that the last commit message of one branch was garbled, which was the Chinese content committed by a colleague on a Chinese-encoded linux with GBK encoding, and it was garbled when I checked it with the git gui tool.

Maybe replacing or removing non-utf8 character content would be a good idea.

Gitea Version

1.21.0

Git Version

2.38.1

Operating System

Centos 7.9

How are you running Gitea?

Running the official release version through systemd

Database

MySQL/MariaDB

Originally created by @inbjo on GitHub (Nov 23, 2023). ### Description Upgraded to version 1.21.0 I get an error accessing the repository page. > SyncRepoBranches, Error 1366 (HY000): Incorrect string value: '\xBD\xE2\xBE\xF673...' for column 'commit_message' at row 1 I converted all database tables to utf8mb4 via gitea cli command doctor convent. but the problem remains. I have checked the database and configuration files to make sure they are OK. > 2023/11/22 15:05:34 ...repository/branch.go:25:SyncRepoBranches() [D] SyncRepoBranches: in Repo[84:ASM-Server/asm_03_regist] 2023/11/22 15:05:34 ...repository/branch.go:44:SyncRepoBranchesWithRepo() [T] SyncRepoBranches[.....] 2023/11/22 15:05:34 ...ules/context/repo.go:681:RepoAssignment() [E] SyncRepoBranches: Error 1366 (HY000): Incorrect string value: '\xBD\xE2\xBE\xF673...' for column 'commit_message' at row 1 Looking through the logs I found that writing the branch information to the database fails because the commit message contains garbled code. I downloaded the repository to check the information of the last commit message, and found that the last commit message of one branch was garbled, which was the Chinese content committed by a colleague on a Chinese-encoded linux with GBK encoding, and it was garbled when I checked it with the git gui tool. Maybe replacing or removing non-utf8 character content would be a good idea. ### Gitea Version 1.21.0 ### Git Version 2.38.1 ### Operating System Centos 7.9 ### How are you running Gitea? Running the official release version through systemd ### Database MySQL/MariaDB
GiteaMirror added the type/bug label 2025-11-02 09:56:47 -06:00
Author
Owner

@lunny commented on GitHub (Nov 23, 2023):

Use gitea doctor convert to convert UTF8 -> UTF8mb4 and don't forget to change your app.ini.

@lunny commented on GitHub (Nov 23, 2023): Use `gitea doctor convert` to convert UTF8 -> UTF8mb4 and don't forget to change your app.ini.
Author
Owner

@wxiaoguang commented on GitHub (Nov 23, 2023):

Use gitea doctor convert to convert UTF8 -> UTF8mb4 and don't forget to change your app.ini.

It's not related. The problem is that the user is using GBK encoding in their commit message. There is no way to bypass at the moment, unless the user create a new commit with proper commit message.

@wxiaoguang commented on GitHub (Nov 23, 2023): > Use `gitea doctor convert` to convert UTF8 -> UTF8mb4 and don't forget to change your app.ini. It's not related. The problem is that the user is using GBK encoding in their commit message. There is no way to bypass at the moment, unless the user create a new commit with proper commit message.
Author
Owner

@lunny commented on GitHub (Nov 23, 2023):

Wow, you are right. When syncing branches into database, we also insert the latest commit message. We need to detect and change it into UTF8.

@lunny commented on GitHub (Nov 23, 2023): Wow, you are right. When syncing branches into database, we also insert the latest commit message. We need to detect and change it into UTF8.
Author
Owner

@inbjo commented on GitHub (Nov 24, 2023):

Wow, you are right. When syncing branches into database, we also insert the latest commit message. We need to detect and change it into UTF8.

The problem occurred when I upgraded from version 1.20.5 to 1.21.0, then I synchronized the repository branch through the admin panel, and accessing the repository prompted

SyncRepoBranches, Error 1366 (HY000): Incorrect string value: '\xBD\xE2\xBE\xF673...' for column 'commit_message' at row 1

At this point I realized that the database table was probably not utfmb4, I ran doctor convert to fix it, and it told me that the conversion was successful and that I had changed the database configuration file connection encoding format.

But still prompted the above error, finally I cloned the repository to manually find all the branches, found a repository last commit content is messy (GBK encoding) I resubmit the content override on the work of normal!

Later I uploaded the repository to try.gitea.io and found that I couldn't reproduce it, and upgrading to 1.21.0 made it possible to reproduce it if the last commit message of a branch contained non-utf8 characters. Creating a new repository is no problem.

It is recommended to remove non-utf8 characters when synchronizing branches to avoid failing to synchronize branches.

@inbjo commented on GitHub (Nov 24, 2023): > Wow, you are right. When syncing branches into database, we also insert the latest commit message. We need to detect and change it into UTF8. The problem occurred when I upgraded from version 1.20.5 to 1.21.0, then I synchronized the repository branch through the admin panel, and accessing the repository prompted > SyncRepoBranches, Error 1366 (HY000): Incorrect string value: '\xBD\xE2\xBE\xF673...' for column 'commit_message' at row 1 At this point I realized that the database table was probably not utfmb4, I ran doctor convert to fix it, and it told me that the conversion was successful and that I had changed the database configuration file connection encoding format. But still prompted the above error, finally I cloned the repository to manually find all the branches, found a repository last commit content is messy (GBK encoding) I resubmit the content override on the work of normal! Later I uploaded the repository to try.gitea.io and found that I couldn't reproduce it, and upgrading to 1.21.0 made it possible to reproduce it if the last commit message of a branch contained non-utf8 characters. Creating a new repository is no problem. It is recommended to remove non-utf8 characters when synchronizing branches to avoid failing to synchronize branches.
Author
Owner

@darrinsmart commented on GitHub (Dec 4, 2023):

I think I've found the same thing using the PostgreSQL database backend. We had a commit message that contained stray 0xC2 byte. It happens to also be a branch head. Since upgrading from 1.19 to 1.21, attempting to view that repo gives a 500 error and logs:

2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch (1.172484051s)
2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check (1.174623221s)
2023/12/04 22:51:31 ...ules/context/repo.go:682:RepoAssignment() [E] SyncRepoBranches: pq: invalid byte sequence for encoding "UTF8": 0xc2
2023/12/04 22:51:31 .../context_response.go:68:HTML() [D] Template: status/500
2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch (1.178875462s)
2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check (1.17910161s)

The PostgreSQL server logs:

2023-12-04 22:51:31.631 GMT [3446884] ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xc2
2023-12-04 22:51:31.631 GMT [3446884] LOCATION:  report_invalid_encoding, mbutils.c:1597
2023-12-04 22:51:31.631 GMT [3446884] STATEMENT:  INSERT INTO "branch" ("repo_id","name","commit_id","commit_message","pusher_id","is_deleted","deleted_by_id","deleted_unix","commit_time","created_unix","updated_unix") VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11) RETURNING "id"
@darrinsmart commented on GitHub (Dec 4, 2023): I think I've found the same thing using the PostgreSQL database backend. We had a commit message that contained stray 0xC2 byte. It happens to also be a branch head. Since upgrading from 1.19 to 1.21, attempting to view that repo gives a 500 error and logs: ``` 2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch (1.172484051s) 2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check (1.174623221s) 2023/12/04 22:51:31 ...ules/context/repo.go:682:RepoAssignment() [E] SyncRepoBranches: pq: invalid byte sequence for encoding "UTF8": 0xc2 2023/12/04 22:51:31 .../context_response.go:68:HTML() [D] Template: status/500 2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch (1.178875462s) 2023/12/04 22:51:31 ...dules/git/command.go:345:Run() [D] slow git.Command.Run: /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check (1.17910161s) ``` The PostgreSQL server logs: ``` 2023-12-04 22:51:31.631 GMT [3446884] ERROR: 22021: invalid byte sequence for encoding "UTF8": 0xc2 2023-12-04 22:51:31.631 GMT [3446884] LOCATION: report_invalid_encoding, mbutils.c:1597 2023-12-04 22:51:31.631 GMT [3446884] STATEMENT: INSERT INTO "branch" ("repo_id","name","commit_id","commit_message","pusher_id","is_deleted","deleted_by_id","deleted_unix","commit_time","created_unix","updated_unix") VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11) RETURNING "id" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#12070