Labels Not Displaying Emoji After Database Import #9509

Open
opened 2025-11-02 08:41:12 -06:00 by GiteaMirror · 12 comments
Owner

Originally created by @Deadmano on GitHub (Sep 4, 2022).

Description

After migrating servers, and importing the Gitea database, existing labels have not retained their emoji, and instead show ? in the place of unicode emoji. I have checked the database to ensure the collation is correct and that it is using utf8mb4_unicode_ci.

This only effects existing emoji in labels, but if I create a new set of labels they both display and store correctly.

How can this issue be resolved? How can I have all the existing emoji display correctly again? Is there a task that can be run?

Screenshots

Gitea_Labels

Gitea Version

1.17.1

Can you reproduce the bug on the Gitea demo site?

No

Operating System

Windows 10

Browser Version

Firefox 102

Originally created by @Deadmano on GitHub (Sep 4, 2022). ### Description After migrating servers, and importing the Gitea database, existing labels have not retained their emoji, and instead show `?` in the place of unicode emoji. I have checked the database to ensure the collation is correct and that it is using `utf8mb4_unicode_ci`. This only effects existing emoji in labels, but if I create a new set of labels they both display and store correctly. How can this issue be resolved? How can I have all the existing emoji display correctly again? Is there a task that can be run? ### Screenshots ![Gitea_Labels](https://user-images.githubusercontent.com/7104251/188307785-38b5d5a7-282a-4803-b82a-e7f6614621a4.PNG) ### Gitea Version 1.17.1 ### Can you reproduce the bug on the Gitea demo site? No ### Operating System Windows 10 ### Browser Version Firefox 102
GiteaMirror added the topic/uitype/bugissue/workaround labels 2025-11-02 08:41:12 -06:00
Author
Owner

@Deadmano commented on GitHub (Sep 4, 2022):

And this is how they should look like, which they do currently, but not with existing ones that were imported.
Gitea_Labels_Normal

@Deadmano commented on GitHub (Sep 4, 2022): And this is how they should look like, which they do currently, but not with existing ones that were imported. ![Gitea_Labels_Normal](https://user-images.githubusercontent.com/7104251/188309194-60d8610e-663d-4efc-a55a-f9a06887022c.PNG)
Author
Owner

@Deadmano commented on GitHub (Sep 4, 2022):

@zeripath / @lunny based on #6992 is it possible that either of you may know what is going on here?

@Deadmano commented on GitHub (Sep 4, 2022): @zeripath / @lunny based on #6992 is it possible that either of you may know what is going on here?
Author
Owner

@zeripath commented on GitHub (Sep 4, 2022):

Which database are you running?

What collation is it in?

I suspect that you're running mssql in a non-unicode collation.

It has to be a unicode collation.

@zeripath commented on GitHub (Sep 4, 2022): Which database are you running? What collation is it in? I suspect that you're running mssql in a non-unicode collation. It has to be a unicode collation.
Author
Owner

@Deadmano commented on GitHub (Sep 4, 2022):

It's MySQL 8.0. The database is utf8mb4 with collation
utf8mb4_unicode_ci. I've not had any issues using emoji prior, it is
simply when transferring to a new machine and importing the SQL database
backup that this happens. And I checked that it was imported in the correct
collation as well.

Happy to check anything else that may resolve this issue, as there are
quite a few labels that would need correcting.

On Sun, 4 Sep 2022, 13:19 zeripath, @.***> wrote:

Which database are you running?

What collation is it in?

I suspect that you're running mssql in a non-unicode collation.

It has to be a unicode collation.


Reply to this email directly, view it on GitHub
https://github.com/go-gitea/gitea/issues/21048#issuecomment-1236315137,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABWGN6ZAD22MZJHOT4O26X3V4SATBANCNFSM6AAAAAAQEIHLAA
.
You are receiving this because you authored the thread.Message ID:
@.***>

@Deadmano commented on GitHub (Sep 4, 2022): It's MySQL 8.0. The database is `utf8mb4` with collation `utf8mb4_unicode_ci`. I've not had any issues using emoji prior, it is simply when transferring to a new machine and importing the SQL database backup that this happens. And I checked that it was imported in the correct collation as well. Happy to check anything else that may resolve this issue, as there are quite a few labels that would need correcting. On Sun, 4 Sep 2022, 13:19 zeripath, ***@***.***> wrote: > Which database are you running? > > What collation is it in? > > I suspect that you're running mssql in a non-unicode collation. > > It has to be a unicode collation. > > — > Reply to this email directly, view it on GitHub > <https://github.com/go-gitea/gitea/issues/21048#issuecomment-1236315137>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABWGN6ZAD22MZJHOT4O26X3V4SATBANCNFSM6AAAAAAQEIHLAA> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@zeripath commented on GitHub (Sep 4, 2022):

I guess you need to find out if these characters are ? in the dB or if the problem is between the dB and Gitea.

I suspect that the issue is that they have been incorrectly imported into your db and they are now ? in the dB.

If they're correct in the dB then take a look at your charset setting in the database section of the app.ini make sure that's utf8mb4

@zeripath commented on GitHub (Sep 4, 2022): I guess you need to find out if these characters are `?` in the dB or if the problem is between the dB and Gitea. I suspect that the issue is that they have been incorrectly imported into your db and they are now `?` in the dB. If they're correct in the dB then take a look at your charset setting in the database section of the app.ini make sure that's utf8mb4
Author
Owner

@Deadmano commented on GitHub (Sep 5, 2022):

@zeripath they indeed do display as ? characters in the database... So that means it must have been something in the import that went wrong. The charset is definitely utf8mb4 in the app.ini.

This is the command I used to restore the database, minus of course all the connection bits:
mysql --default-character-set=utf8mb4 --database=gitea < "gitea.sql"

Do you perhaps see anything that I may have missed? Since I know initially I had to update the charset/collation to get Gitea to work properly with Unicode, but that seems to have stuck around since I am able to create labels with Unicode and use them just fine.

@Deadmano commented on GitHub (Sep 5, 2022): @zeripath they indeed do display as `?` characters in the database... So that means it must have been something in the import that went wrong. The charset is definitely `utf8mb4` in the `app.ini`. This is the command I used to restore the database, minus of course all the connection bits: `mysql --default-character-set=utf8mb4 --database=gitea < "gitea.sql"` Do you perhaps see anything that I may have missed? Since I know initially I had to update the charset/collation to get Gitea to work properly with Unicode, but that seems to have stuck around since I am able to create labels with Unicode and use them just fine.
Author
Owner

@lunny commented on GitHub (Sep 5, 2022):

#6992

So they are not ? in the old database? Every table may also have different collation from database.

@lunny commented on GitHub (Sep 5, 2022): > #6992 So they are not `?` in the old database? Every table may also have different collation from database.
Author
Owner

@Deadmano commented on GitHub (Sep 5, 2022):

@lunny they were not, no. But upon importing them they were. I know Gitea doesn't by default handle utf8mb4 but I do believe this is now outside of the scope and I'll have to figure out where the import process when wrong. This may be somewhat useful for future reference, as a direct import is not quite possible, despite setting the correct default collation. I may need to look into re-creating the structure, at least for the labels, ahead of time, before importing?

@Deadmano commented on GitHub (Sep 5, 2022): @lunny they were not, no. But upon importing them they were. I know Gitea doesn't by default handle `utf8mb4` but I do believe this is now outside of the scope and I'll have to figure out where the import process when wrong. This may be somewhat useful for future reference, as a direct import is not quite possible, despite setting the correct default collation. I may need to look into re-creating the structure, at least for the labels, ahead of time, before importing?
Author
Owner

@Deadmano commented on GitHub (Sep 5, 2022):

Just to update, @lunny and @zeripath, turns out Gitea's dump using the command ./gitea dump -c /path/to/app.ini does not preserve the collation and all Unicode characters end up being transformed as ?. So I guess this issue would be; Add Unicode Support To Database Backups. I wish I had known that beforehand, and before wiping the previous install. It seems there is no way to recover from this.

@Deadmano commented on GitHub (Sep 5, 2022): Just to update, @lunny and @zeripath, turns out Gitea's dump using the command `./gitea dump -c /path/to/app.ini` does not preserve the collation and all Unicode characters end up being transformed as `?`. So I guess this issue would be; `Add Unicode Support To Database Backups`. I wish I had known that beforehand, and before wiping the previous install. It seems there is no way to recover from this.
Author
Owner

@Deadmano commented on GitHub (Sep 5, 2022):

In case anyone else stumbles upon this in the interim, I managed to work around this for future use by utilising mysqldump.

mysqldump --user="" --password="" --default-character-set=utf8mb4 gitea --result-file=".\gitea.sql"

This gave a proper output preserving the unicode characters.

@Deadmano commented on GitHub (Sep 5, 2022): In case anyone else stumbles upon this in the interim, I managed to work around this for future use by utilising `mysqldump`. `mysqldump --user="" --password="" --default-character-set=utf8mb4 gitea --result-file=".\gitea.sql"` This gave a proper output preserving the unicode characters.
Author
Owner

@zeripath commented on GitHub (Sep 5, 2022):

This is strange because gitea dump does nothing different from that of Gitea itself - so if Gitea could read the characters the dump should have.

From what version of Gitea did you make your dump?

@zeripath commented on GitHub (Sep 5, 2022): This is strange because `gitea dump` does nothing different from that of Gitea itself - so if Gitea could read the characters the dump should have. From what version of Gitea did you make your dump?
Author
Owner

@Deadmano commented on GitHub (Sep 5, 2022):

It was on 1.17's RC, recently updated to 1.17.1. Gitea could only read the characters because the database collation and default character set needed to be changed/updated manually by me, to support utf8mb4 as well as utf8mb4_unicode_ci collation on the relevant columns, as the default setup by Gitea only used utf8 without extended unicode support.

I can confirm that the backup dump done via gitea dump does indeed strip the Unicode characters and leave ? in their place.

I'm not sure how gitea dump works exactly, but if there would be a way to set a default character set like I had to do for MySQL above using mysqldump that would be a much better option, as right now the backup is pretty much pointless since restoring it will display ? in the place of Unicode characters such as emoji.

@Deadmano commented on GitHub (Sep 5, 2022): It was on 1.17's RC, recently updated to `1.17.1`. Gitea could only read the characters because the database collation and default character set needed to be changed/updated manually by me, to support `utf8mb4` as well as `utf8mb4_unicode_ci` collation on the relevant columns, as the default setup by Gitea only used `utf8` without extended unicode support. I can confirm that the backup dump done via `gitea dump` does indeed strip the Unicode characters and leave `?` in their place. I'm not sure how `gitea dump` works exactly, but if there would be a way to set a default character set like I had to do for MySQL above using `mysqldump` that would be a much better option, as right now the backup is pretty much pointless since restoring it will display `?` in the place of Unicode characters such as emoji.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#9509