Emojis are converted to question-marks in repository description #1151

Closed
opened 2025-11-02 03:50:11 -06:00 by GiteaMirror · 18 comments
Owner

Originally created by @jonasfranz on GitHub (Oct 15, 2017).

  • Gitea version (or commit ref): f3833b7
  • Operating system: GNU/Linux
  • Database (use [x]):
    • MySQL
  • Can you reproduce the bug at https://try.gitea.io:
    • No
  • Log gist: Not relevant

Description

It result into the following: "????App for iOS" if you want to add emojis to your repository description like "📱App for iOS".

Emojis in the description could be useful like seen at the ownCloud Github Project (https://github.com/owncloud).

This might be caused by the used MySQL database.

Originally created by @jonasfranz on GitHub (Oct 15, 2017). - Gitea version (or commit ref): f3833b7 - Operating system: GNU/Linux - Database (use `[x]`): - [x] MySQL - Can you reproduce the bug at https://try.gitea.io: - [x] No - Log gist: Not relevant ## Description It result into the following: "????App for iOS" if you want to add emojis to your repository description like "📱App for iOS". Emojis in the description could be useful like seen at the ownCloud Github Project (https://github.com/owncloud). This might be caused by the used MySQL database.
GiteaMirror added the issue/confirmedtopic/uitype/enhancement labels 2025-11-02 03:50:11 -06:00
Author
Owner

@lunny commented on GitHub (Dec 5, 2017):

So maybe mysql database should be utf8mb4?

@lunny commented on GitHub (Dec 5, 2017): So maybe mysql database should be `utf8mb4`?
Author
Owner

@kolaente commented on GitHub (Mar 24, 2018):

I had a similar issue, but in my case the description was completly deleted when I added an emoji to the repo description (v1.4-rc-2). Seems to work fine on master with sqlite though.

@kolaente commented on GitHub (Mar 24, 2018): I had a similar issue, but in my case the description was completly deleted when I added an emoji to the repo description (`v1.4-rc-2`). Seems to work fine on master with sqlite though.
Author
Owner

@lunny commented on GitHub (Dec 9, 2018):

This should be fixed by https://github.com/go-gitea/gitea/pull/5168, please feel free to reopen it.

@lunny commented on GitHub (Dec 9, 2018): ~This should be fixed by https://github.com/go-gitea/gitea/pull/5168, please feel free to reopen it.~
Author
Owner

@lunny commented on GitHub (Dec 9, 2018):

If you input :smile: that right for repo description, but if you paste from your clipboard, that will fail.

@lunny commented on GitHub (Dec 9, 2018): If you input `:smile:` that right for repo description, but if you paste from your clipboard, that will fail.
Author
Owner

@immanuelfodor commented on GitHub (Dec 31, 2018):

I can confirm this on v1.6.2, any emoji copypasted from e.g. https://emojipedia.org becomes ????. Only the manually typed :emojicode: works fine. This issue is also present at eg. org descriptions, copypasted emojis become question marks. Screenshot from an issue comment:

screenshot_20181231_175649

@immanuelfodor commented on GitHub (Dec 31, 2018): I can confirm this on v1.6.2, any emoji copypasted from e.g. https://emojipedia.org becomes `????`. Only the manually typed `:emojicode:` works fine. This issue is also present at eg. org descriptions, copypasted emojis become question marks. Screenshot from an issue comment: ![screenshot_20181231_175649](https://user-images.githubusercontent.com/21174107/50564488-87837080-0d25-11e9-8801-65f7cba2608b.png)
Author
Owner

@lunny commented on GitHub (Jan 3, 2019):

So should we parse the copypasted emojis to :emojicode: before save it?

@lunny commented on GitHub (Jan 3, 2019): So should we parse the copypasted emojis to :emojicode: before save it?
Author
Owner

@immanuelfodor commented on GitHub (Jan 3, 2019):

Great idea, it should work without utf8mb4 then (simple utf8 databases/tables).

@immanuelfodor commented on GitHub (Jan 3, 2019): Great idea, it should work without `utf8mb4` then (simple `utf8` databases/tables).
Author
Owner

@stale[bot] commented on GitHub (Mar 4, 2019):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

@stale[bot] commented on GitHub (Mar 4, 2019): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.
Author
Owner

@immanuelfodor commented on GitHub (Mar 4, 2019):

Any new info on somebody planning to implement the suggested conversion that could solve the original issue? :)

@immanuelfodor commented on GitHub (Mar 4, 2019): Any new info on somebody planning to implement the suggested conversion that could solve the original issue? :)
Author
Owner

@helmut72 commented on GitHub (Mar 10, 2019):

Or just add MySQL utf8mb4 support.

@helmut72 commented on GitHub (Mar 10, 2019): Or just add MySQL utf8mb4 support.
Author
Owner

@immanuelfodor commented on GitHub (May 24, 2019):

Wow, thank you, @lunny ! Will there be a migration guide for us until the next release how to upgrade an existing database? Or this depends on the community if someone publishes such? I think I did such for a Nextcloud install once following these steps: https://docs.nextcloud.com/server/16/admin_manual/configuration_database/mysql_4byte_support.html Should these steps work in theory for Gitea as well? (With replacing the DB name, of course)

@immanuelfodor commented on GitHub (May 24, 2019): Wow, thank you, @lunny ! Will there be a migration guide for us until the next release how to upgrade an existing database? Or this depends on the community if someone publishes such? I think I did such for a Nextcloud install once following these steps: https://docs.nextcloud.com/server/16/admin_manual/configuration_database/mysql_4byte_support.html Should these steps work in theory for Gitea as well? (With replacing the DB name, of course)
Author
Owner

@lunny commented on GitHub (May 24, 2019):

@immanuelfodor convert a utf8 database to utf8mb4 database is possbile. And I found an article about how to convert utf8 to utf8mb4, see https://mathiasbynens.be/notes/mysql-utf8mb4

@lunny commented on GitHub (May 24, 2019): @immanuelfodor convert a utf8 database to utf8mb4 database is possbile. And I found an article about how to convert utf8 to utf8mb4, see https://mathiasbynens.be/notes/mysql-utf8mb4
Author
Owner

@immanuelfodor commented on GitHub (Jul 31, 2019):

The new PRs #7144 #6992 took care of the conversion with the new gitea convert command successfully but I still get four question marks in comments when commenting with an emoji. All my tables are Barracuda, utf8mb4, row format dynamic, etc etc. Gitea was newly built, restarted, new login session.

gitea -v
# Gitea version 1.9.0 built with GNU Make 4.1, go1.12.7 : bindata
mysql -V
# mysql  Ver 15.1 Distrib 10.1.40-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
cat /etc/os-release | grep -i pretty_name
# PRETTY_NAME="Ubuntu 18.04.2 LTS"
@immanuelfodor commented on GitHub (Jul 31, 2019): The new PRs #7144 #6992 took care of the conversion with the new `gitea convert` command successfully but I still get four question marks in comments when commenting with an emoji. All my tables are Barracuda, utf8mb4, row format dynamic, etc etc. Gitea was newly built, restarted, new login session. ```bash gitea -v # Gitea version 1.9.0 built with GNU Make 4.1, go1.12.7 : bindata mysql -V # mysql Ver 15.1 Distrib 10.1.40-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2 cat /etc/os-release | grep -i pretty_name # PRETTY_NAME="Ubuntu 18.04.2 LTS" ```
Author
Owner

@immanuelfodor commented on GitHub (Jul 31, 2019):

I'm not sure if this is a DB issue because the CLI shows ???? as if it was saved to the DB this way.

MariaDB [gitea]> select id, name, content from issue where id=2;
+----+-----------------+---------------------+
| id | name            | content             |
+----+-----------------+---------------------+
|  2 | Testing utf8mb4 | :grinning: 

???? |
+----+-----------------+---------------------+
1 row in set (0.00 sec)

MariaDB [information_schema]> select column_name, character_set_name, collation_name from columns where table_schema = "gitea" and table_name = "issue" and character_set_name is not null;
+-------------+--------------------+--------------------+
| column_name | character_set_name | collation_name     |
+-------------+--------------------+--------------------+
| name        | utf8mb4            | utf8mb4_general_ci |
| content     | utf8mb4            | utf8mb4_general_ci |
| ref         | utf8mb4            | utf8mb4_general_ci |
+-------------+--------------------+--------------------+
3 rows in set (0.00 sec)

select table_name, table_collation, engine, row_format, create_options from tables where table_schema = "gitea" and table_name = "issue";
+------------+--------------------+--------+------------+--------------------+
| table_name | table_collation    | engine | row_format | create_options     |
+------------+--------------------+--------+------------+--------------------+
| issue      | utf8mb4_general_ci | InnoDB | Dynamic    | row_format=DYNAMIC |
+------------+--------------------+--------+------------+--------------------+
1 row in set (0.00 sec)

MariaDB [information_schema]> select * from innodb_sys_tables where name = "gitea/issue";
+----------+-------------+------+--------+-------+-------------+------------+---------------+
| TABLE_ID | NAME        | FLAG | N_COLS | SPACE | FILE_FORMAT | ROW_FORMAT | ZIP_PAGE_SIZE |
+----------+-------------+------+--------+-------+-------------+------------+---------------+
|     1005 | gitea/issue |   33 |     20 |   991 | Barracuda   | Dynamic    |             0 |
+----------+-------------+------+--------+-------+-------------+------------+---------------+
1 row in set (0.00 sec)

MariaDB [information_schema]> show variables like 'innodb_file_%';
+--------------------------+-----------+
| Variable_name            | Value     |
+--------------------------+-----------+
| innodb_file_format       | Barracuda |
| innodb_file_format_check | ON        |
| innodb_file_format_max   | Barracuda |
| innodb_file_per_table    | ON        |
+--------------------------+-----------+
4 rows in set (0.00 sec)

MariaDB [information_schema]> show variables like 'innodb_large_%';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| innodb_large_prefix | ON    |
+---------------------+-------+
1 row in set (0.00 sec)
@immanuelfodor commented on GitHub (Jul 31, 2019): I'm not sure if this is a DB issue because the CLI shows ???? as if it was saved to the DB this way. ```sql MariaDB [gitea]> select id, name, content from issue where id=2; +----+-----------------+---------------------+ | id | name | content | +----+-----------------+---------------------+ | 2 | Testing utf8mb4 | :grinning: ???? | +----+-----------------+---------------------+ 1 row in set (0.00 sec) MariaDB [information_schema]> select column_name, character_set_name, collation_name from columns where table_schema = "gitea" and table_name = "issue" and character_set_name is not null; +-------------+--------------------+--------------------+ | column_name | character_set_name | collation_name | +-------------+--------------------+--------------------+ | name | utf8mb4 | utf8mb4_general_ci | | content | utf8mb4 | utf8mb4_general_ci | | ref | utf8mb4 | utf8mb4_general_ci | +-------------+--------------------+--------------------+ 3 rows in set (0.00 sec) select table_name, table_collation, engine, row_format, create_options from tables where table_schema = "gitea" and table_name = "issue"; +------------+--------------------+--------+------------+--------------------+ | table_name | table_collation | engine | row_format | create_options | +------------+--------------------+--------+------------+--------------------+ | issue | utf8mb4_general_ci | InnoDB | Dynamic | row_format=DYNAMIC | +------------+--------------------+--------+------------+--------------------+ 1 row in set (0.00 sec) MariaDB [information_schema]> select * from innodb_sys_tables where name = "gitea/issue"; +----------+-------------+------+--------+-------+-------------+------------+---------------+ | TABLE_ID | NAME | FLAG | N_COLS | SPACE | FILE_FORMAT | ROW_FORMAT | ZIP_PAGE_SIZE | +----------+-------------+------+--------+-------+-------------+------------+---------------+ | 1005 | gitea/issue | 33 | 20 | 991 | Barracuda | Dynamic | 0 | +----------+-------------+------+--------+-------+-------------+------------+---------------+ 1 row in set (0.00 sec) MariaDB [information_schema]> show variables like 'innodb_file_%'; +--------------------------+-----------+ | Variable_name | Value | +--------------------------+-----------+ | innodb_file_format | Barracuda | | innodb_file_format_check | ON | | innodb_file_format_max | Barracuda | | innodb_file_per_table | ON | +--------------------------+-----------+ 4 rows in set (0.00 sec) MariaDB [information_schema]> show variables like 'innodb_large_%'; +---------------------+-------+ | Variable_name | Value | +---------------------+-------+ | innodb_large_prefix | ON | +---------------------+-------+ 1 row in set (0.00 sec) ```
Author
Owner

@lunny commented on GitHub (Aug 1, 2019):

@immanuelfodor could you paste the content here so that I can test it locally.

@lunny commented on GitHub (Aug 1, 2019): @immanuelfodor could you paste the content here so that I can test it locally.
Author
Owner

@immanuelfodor commented on GitHub (Aug 1, 2019):

Just two grinning faces, first line is with :grinning:, second is the same face copied from emojipedia (copy button): https://emojipedia.org/grinning-face/
Funny thing is that in the meantime, I received an email from Gitea, and it contains the emoji fine on the second line. Maybe the email is sent before the multibyte character is converted?
Another idea is the DB connection, in PHP, you would need to run SET NAMES utf8mb4 before anything else, I don't know if it is true for Go as well or if you do it in Gitea: https://stackoverflow.com/questions/16893035/using-utf8mb4-with-php-and-mysql

In the same MariaDB server, a Nextcloud and a TT-RSS database is stored, too, and both handle emojis fine with utf8mb4.

@immanuelfodor commented on GitHub (Aug 1, 2019): Just two grinning faces, first line is with `:grinning:`, second is the same face copied from emojipedia (copy button): https://emojipedia.org/grinning-face/ Funny thing is that in the meantime, I received an email from Gitea, and it contains the emoji fine on the second line. Maybe the email is sent before the multibyte character is converted? Another idea is the DB connection, in PHP, you would need to run `SET NAMES utf8mb4` before anything else, I don't know if it is true for Go as well or if you do it in Gitea: https://stackoverflow.com/questions/16893035/using-utf8mb4-with-php-and-mysql In the same MariaDB server, a Nextcloud and a TT-RSS database is stored, too, and both handle emojis fine with utf8mb4.
Author
Owner

@lunny commented on GitHub (Aug 1, 2019):

@immanuelfodor You should change charset in app.ini to utf8mb4. Go to https://docs.gitea.io/en-us/config-cheat-sheet/ and search CHARSET . I think your problem maybe because you haven't set that.

@lunny commented on GitHub (Aug 1, 2019): @immanuelfodor You should change `charset` in app.ini to `utf8mb4`. Go to https://docs.gitea.io/en-us/config-cheat-sheet/ and search CHARSET . I think your problem maybe because you haven't set that.
Author
Owner

@immanuelfodor commented on GitHub (Aug 1, 2019):

Aaand YES! I looked through my app.ini before, but I did not have the charset option there, it must be newer than my file. Added it, restarted Gitea, new comment, and it works! Thank you very much.

@immanuelfodor commented on GitHub (Aug 1, 2019): Aaand YES! I looked through my app.ini before, but I did not have the charset option there, it must be newer than my file. Added it, restarted Gitea, new comment, and it works! Thank you very much.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#1151