500 error when PR comment #5215

Closed
opened 2025-11-02 06:18:09 -06:00 by GiteaMirror · 3 comments
Owner

Originally created by @yecpster on GitHub (Apr 9, 2020).

  • Gitea version (or commit ref): 1.11.3
  • Git version: 2.7.4
  • Operating system: Ubuntu 16.04.6 LTS
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant
  • Log gist:
    .../repo/pull_review.go:48:CreateCodeComment() [E] CreateCodeComment: Error 1366: Incorrect string value: '\xC5\xE4\xD6\xC3\xC8\xD5...' for column 'patch' at row 1

Description

I have set CHARSET = utf8mb4 in app.ini, but the issue persist.
My table collation is utf8mb4_general_ci.
This issue occur only when I comment in specific file, I put the same comment to other files but no error occur.

Screenshots

Originally created by @yecpster on GitHub (Apr 9, 2020). <!-- NOTE: If your issue is a security concern, please send an email to security@gitea.io instead of opening a public issue --> <!-- 1. Please speak English, this is the language all maintainers can speak and write. 2. Please ask questions or configuration/deploy problems on our Discord server (https://discord.gg/gitea) or forum (https://discourse.gitea.io). 3. Please take a moment to check that your issue doesn't already exist. 4. Please give all relevant information below for bug reports, because incomplete details will be handled as an invalid report. --> - Gitea version (or commit ref): 1.11.3 - Git version: 2.7.4 - Operating system: Ubuntu 16.04.6 LTS - Database (use `[x]`): - [ ] PostgreSQL - [x] MySQL - [ ] MSSQL - [ ] SQLite - Can you reproduce the bug at https://try.gitea.io: - [ ] Yes (provide example URL) - [x] No - [ ] Not relevant - Log gist: .../repo/pull_review.go:48:CreateCodeComment() [E] CreateCodeComment: Error 1366: Incorrect string value: '\xC5\xE4\xD6\xC3\xC8\xD5...' for column 'patch' at row 1 ## Description I have set CHARSET = utf8mb4 in app.ini, but the issue persist. My table collation is utf8mb4_general_ci. This issue occur only when I comment in specific file, I put the same comment to other files but no error occur. ## Screenshots <!-- **If this issue involves the Web Interface, please include a screenshot** -->
GiteaMirror added the issue/duplicate label 2025-11-02 06:18:09 -06:00
Author
Owner

@guillep2k commented on GitHub (Apr 10, 2020):

The problem may have multiple causes.

  1. It's probably not the database but the source file you're commenting on. Can you check what's the character encoding / code-page of that particular file? (e.g. UTF-8, BIG5, etc.). If it's UTF-8, then that's not the problem.

  2. If it's the database, changing the CHARSET setting after the database was created will have no effect, as the database tables are already created. Also, the collation settings only affect... collation (i.e. sorting data, like in ORDER BY). You can search the issues here, I'm sure there are some that deal with migrating the character set in MySQL.

@guillep2k commented on GitHub (Apr 10, 2020): The problem may have multiple causes. 1. It's probably not the database but the source file you're commenting on. Can you check what's the character encoding / code-page of that particular file? (e.g. UTF-8, BIG5, etc.). If it's UTF-8, then that's not the problem. 2. If it's the database, changing the `CHARSET` setting _after_ the database was created will have no effect, as the database tables are already created. Also, the collation settings only affect... collation (i.e. sorting data, like in `ORDER BY`). You can search the issues here, I'm sure there are some that deal with migrating the character set in MySQL.
Author
Owner

@yecpster commented on GitHub (Apr 10, 2020):

The problem may have multiple causes.

  1. It's probably not the database but the source file you're commenting on. Can you check what's the character encoding / code-page of that particular file? (e.g. UTF-8, BIG5, etc.). If it's UTF-8, then that's not the problem.
  2. If it's the database, changing the CHARSET setting after the database was created will have no effect, as the database tables are already created. Also, the collation settings only affect... collation (i.e. sorting data, like in ORDER BY). You can search the issues here, I'm sure there are some that deal with migrating the character set in MySQL.

Thanks guillep2k, after changed the source file to UTF-8, it's working now. But that means my gitea can only review file encoding in UTF-8, right? Does gitea have any plan to support file in other encoding? Or maybe display a friendly error message to tell us the file must encoding in UTF-8.

@yecpster commented on GitHub (Apr 10, 2020): > The problem may have multiple causes. > > 1. It's probably not the database but the source file you're commenting on. Can you check what's the character encoding / code-page of that particular file? (e.g. UTF-8, BIG5, etc.). If it's UTF-8, then that's not the problem. > 2. If it's the database, changing the `CHARSET` setting _after_ the database was created will have no effect, as the database tables are already created. Also, the collation settings only affect... collation (i.e. sorting data, like in `ORDER BY`). You can search the issues here, I'm sure there are some that deal with migrating the character set in MySQL. Thanks guillep2k, after changed the source file to UTF-8, it's working now. But that means my gitea can only review file encoding in UTF-8, right? Does gitea have any plan to support file in other encoding? Or maybe display a friendly error message to tell us the file must encoding in UTF-8.
Author
Owner

@guillep2k commented on GitHub (Apr 10, 2020):

You should use UTF-8 everywhere in 2020. 😉

Now, really, I know what is like to work with a code base which is 25+ years old. If your non-UTF-8 files are mainly in one charset (e.g. iso-8859-5), you could set ANSI_CHARSET in your app.ini:

[repository]
ANSI_CHARSET = iso-8859-5

That should make Gitea assume that anything that's not UTF-8 is iso-8859-5. Gitea assumes the whole file is in one encoding and does its best to guess which one is it, but it sometimes can get confused. Especially problematic are files that have parts in different encodings (e.g. after a patch); since git doesn't care about encoding, it's perfectly possible to have 5 lines in UTF-8, 6 in iso-8859-5, 421 neutral (ASCII) and 36 in BIG5. 😵

Hope this helps.

Note: I'm closing this issue as duplicate (there are several similar to this).

@guillep2k commented on GitHub (Apr 10, 2020): You _should_ use `UTF-8` everywhere in 2020. 😉 Now, really, I know what is like to work with a code base which is 25+ years old. If your non-`UTF-8` files are mainly in ***one*** charset (e.g. `iso-8859-5`), you could set `ANSI_CHARSET` in your `app.ini`: ``` [repository] ANSI_CHARSET = iso-8859-5 ``` That should make Gitea assume that anything that's not `UTF-8` is `iso-8859-5`. Gitea assumes the whole file is in _one_ encoding and does its best to guess which one is it, but it sometimes can get confused. Especially problematic are files that have parts in different encodings (e.g. after a patch); since `git` doesn't care about encoding, it's perfectly possible to have 5 lines in `UTF-8`, 6 in `iso-8859-5`, 421 neutral (`ASCII`) and 36 in `BIG5`. 😵 Hope this helps. Note: I'm closing this issue as duplicate (there are several similar to this).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#5215