After migration russian letters are incorrectly encoded #2454

Closed
opened 2025-11-02 04:36:50 -06:00 by GiteaMirror · 14 comments
Owner

Originally created by @iddm on GitHub (Oct 25, 2018).

I have recently migrated from binary gitea to docker gitea. I made a dump and it imported into database inside docker also and now my issues look like this:

I also have 500 Internal Server Error error very often and when I look for the problem in the logs I see this:

ERROR: invalid character '\n' in string literal

I have no idea what I have done wrong, could anyone help me please?

My docker-compose.yml:

version: "2"

services:
  db:
    image: mariadb:latest
    restart: always
    environment:
      - MYSQL_ROOT_PASSWORD=gitea
      - MYSQL_USER=gitea
      - MYSQL_PASSWORD=gitea
      - MYSQL_DATABASE=gitea
    volumes:
      - ./mysql:/var/lib/mysql

  server:
    image: gitea/gitea:latest
    restart: always
    environment:
      - USER_UID=1006
      - USER_GID=1006
      - USER=gitea
    volumes:
      - ./gitea:/data
    ports:
      - "3000:3000"
      - "222:22"
    depends_on:
      - db
Originally created by @iddm on GitHub (Oct 25, 2018). I have recently migrated from binary gitea to docker gitea. I made a dump and it imported into database inside docker also and now my issues look like this: [![](https://i.imgur.com/h451lCE.png)](https://i.imgur.com/h451lCE.png) I also have `500 Internal Server Error` error very often and when I look for the problem in the logs I see this: > ERROR: invalid character '\n' in string literal I have no idea what I have done wrong, could anyone help me please? My `docker-compose.yml`: ```yaml version: "2" services: db: image: mariadb:latest restart: always environment: - MYSQL_ROOT_PASSWORD=gitea - MYSQL_USER=gitea - MYSQL_PASSWORD=gitea - MYSQL_DATABASE=gitea volumes: - ./mysql:/var/lib/mysql server: image: gitea/gitea:latest restart: always environment: - USER_UID=1006 - USER_GID=1006 - USER=gitea volumes: - ./gitea:/data ports: - "3000:3000" - "222:22" depends_on: - db ```
GiteaMirror added the type/question label 2025-11-02 04:36:50 -06:00
Author
Owner

@zeripath commented on GitHub (Oct 25, 2018):

I think your mariaDB has not been set-up to use utf-8, see https://github.com/docker-library/docs/issues/613

Basically you need docker-compose.yml to read:

version: "2"

services:
  db:
    image: mariadb:latest
    command: ['--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
    restart: always
    environment:
      - MYSQL_ROOT_PASSWORD=gitea
      - MYSQL_USER=gitea
      - MYSQL_PASSWORD=gitea
      - MYSQL_DATABASE=gitea
    volumes:
      - ./mysql:/var/lib/mysql

  server:
    image: gitea/gitea:latest
    restart: always
    environment:
      - USER_UID=1006
      - USER_GID=1006
      - USER=gitea
    volumes:
      - ./gitea:/data
    ports:
      - "3000:3000"
      - "222:22"
    depends_on:
      - db
@zeripath commented on GitHub (Oct 25, 2018): I think your mariaDB has not been set-up to use utf-8, see https://github.com/docker-library/docs/issues/613 Basically you need `docker-compose.yml` to read: ```yaml version: "2" services: db: image: mariadb:latest command: ['--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci'] restart: always environment: - MYSQL_ROOT_PASSWORD=gitea - MYSQL_USER=gitea - MYSQL_PASSWORD=gitea - MYSQL_DATABASE=gitea volumes: - ./mysql:/var/lib/mysql server: image: gitea/gitea:latest restart: always environment: - USER_UID=1006 - USER_GID=1006 - USER=gitea volumes: - ./gitea:/data ports: - "3000:3000" - "222:22" depends_on: - db ```
Author
Owner

@iddm commented on GitHub (Oct 26, 2018):

I don't quite understand when I should set this command: before importing database dump from old gitea or whenever I want and it must help immediately even after dump had been imported?

@iddm commented on GitHub (Oct 26, 2018): I don't quite understand when I should set this `command`: before importing database dump from old gitea or whenever I want and it must help immediately even after dump had been imported?
Author
Owner

@zeripath commented on GitHub (Oct 26, 2018):

I'm not a MariaDB expert, but I suspect you need it at least when you're importing the dump and I'd suspect when you're running the database.

@zeripath commented on GitHub (Oct 26, 2018): I'm not a MariaDB expert, but I suspect you need it at least when you're importing the dump and I'd suspect when you're running the database.
Author
Owner

@zeripath commented on GitHub (Oct 26, 2018):

Your data appears to be double utf8 encoded - if the above doesn't work, it might be worth taking a look at your dump to check if it's been double encoded there. If that's the case then there's likely a bug in the dumping. It should be possible to dedouble encode it with the recode command program.

@zeripath commented on GitHub (Oct 26, 2018): Your data appears to be double utf8 encoded - if the above doesn't work, it might be worth taking a look at your dump to check if it's been double encoded there. If that's the case then there's likely a bug in the dumping. It should be possible to dedouble encode it with the recode command program.
Author
Owner

@iddm commented on GitHub (Oct 26, 2018):

Your data appears to be double utf8 encoded - if the above doesn't work, it might be worth taking a look at your dump to check if it's been double encoded there. If that's the case then there's likely a bug in the dumping. It should be possible to dedouble encode it with the recode command program.

Thank you for your answer! But I have no idea how to reencode it back, gonna google. And how did you find this out, that the data was encoded twice?

@iddm commented on GitHub (Oct 26, 2018): > Your data appears to be double utf8 encoded - if the above doesn't work, it might be worth taking a look at your dump to check if it's been double encoded there. If that's the case then there's likely a bug in the dumping. It should be possible to dedouble encode it with the recode command program. Thank you for your answer! But I have no idea how to reencode it back, gonna google. And how did you find this out, that the data was encoded twice?
Author
Owner

@zeripath commented on GitHub (Oct 26, 2018):

The D bar characters in your screenshot told me that somewhere something was interpreting utf8 high bytes as separate characters rather than as part of an encoded single character.

There are two ways of that happening - the database is unaware that it has utf8 data so it prints out single bytes as characters which the receiving program thinks represents characters, so it reencodes them as utf8 characters - hence you see glyphs that would match the high bytes, or, data has been put into the database already in utf8 encoded bytes but which the db thinks are characters so it reencodes them as bytes.

Now it's difficult to actually see these things because most things nowadays do utf8 properly. You really need to check the bytestream at each point.

You should take a look at the wiki page for utf8 to learn about how it works. File encoding is a surprisingly difficult and fiddly topic in general and it's good to learn about it. Especially if your native language is not written plain old low-byte ASCII Latin.

@zeripath commented on GitHub (Oct 26, 2018): The D bar characters in your screenshot told me that somewhere something was interpreting utf8 high bytes as separate characters rather than as part of an encoded single character. There are two ways of that happening - the database is unaware that it has utf8 data so it prints out single bytes as characters which the receiving program thinks represents characters, so it reencodes them as utf8 characters - hence you see glyphs that would match the high bytes, or, data has been put into the database already in utf8 encoded bytes but which the db thinks are characters so it reencodes them as bytes. Now it's difficult to actually see these things because most things nowadays do utf8 properly. You really need to check the bytestream at each point. You should take a look at the wiki page for utf8 to learn about how it works. File encoding is a surprisingly difficult and fiddly topic in general and it's good to learn about it. Especially if your native language is not written plain old low-byte ASCII Latin.
Author
Owner

@iddm commented on GitHub (Oct 27, 2018):

Okay, I have done what you asked me to do and I still have the same result. Could you recommend me anything else?

@iddm commented on GitHub (Oct 27, 2018): Okay, I have done what you asked me to do and I still have the same result. Could you recommend me anything else?
Author
Owner

@iddm commented on GitHub (Oct 28, 2018):

Okay, this is still unanswered question. I have fixed it for myself so: I have just ignored the dump, installed fresh instance and migrated all of my 42 repositories from old instance, manually. Of course this is painful way but I was not able to find a good one unfortunately.

@iddm commented on GitHub (Oct 28, 2018): Okay, this is still unanswered question. I have fixed it for myself so: I have just ignored the dump, installed fresh instance and migrated all of my 42 repositories from old instance, **manually**. Of course this is painful way but I was not able to find a good one unfortunately.
Author
Owner

@zeripath commented on GitHub (Oct 28, 2018):

Ugh. That's obviously not an ideal situation. Sorry to hear that.

If you're still interested in finding out how to fix this, could you give me some more information?

  1. Did you use the gitea dump command line command to dump the database? Or did you dump from mariaDB directly?
  2. What were the settings of the mariaDB?
  3. If you try dumping your docker dB and reimporting into another docker dB does that still foul up the encoding?
@zeripath commented on GitHub (Oct 28, 2018): Ugh. That's obviously not an ideal situation. Sorry to hear that. If you're still interested in finding out how to fix this, could you give me some more information? 1. Did you use the gitea dump command line command to dump the database? Or did you dump from mariaDB directly? 2. What were the settings of the mariaDB? 3. If you try dumping your docker dB and reimporting into another docker dB does that still foul up the encoding?
Author
Owner

@iddm commented on GitHub (Oct 28, 2018):

Sad day, the old VDS instance where was my old gitea has just been deleted, so I can't tell you exactly what version of mariadb was there, but I remember that I was looking for it when I migrated so they must be the same on new VDS. I don't recall any special settings, I have just installed it via something like apt-get install mariadb and that's all. I tried to create dump via gitea-bin commands as it was told in the documentation. I have restored everything correctly but just this encoding issue happened, everything else was fine afaik.

And, perhaps, you forgot my problem: I migrated from gitea-bin on old vds instance to gitea-docker on new vds instance, I had not used gitea-docker before migration :)

@iddm commented on GitHub (Oct 28, 2018): Sad day, the old VDS instance where was my old gitea has just been deleted, so I can't tell you exactly what version of mariadb was there, but I remember that I was looking for it when I migrated so they must be the same on new VDS. I don't recall any special settings, I have just installed it via something like `apt-get install mariadb` and that's all. I tried to create dump via gitea-bin commands as it was told in the [documentation](https://docs.gitea.io/en-us/backup-and-restore/). I have restored everything correctly but just this encoding issue happened, everything else was fine afaik. And, perhaps, you forgot my problem: I migrated from `gitea-bin` on old vds instance to `gitea-docker` on new vds instance, I had not used `gitea-docker` before migration :)
Author
Owner

@zeripath commented on GitHub (Oct 28, 2018):

I hadn't forgotten about the change to docker, I was just checking whether dumping was working in your new setup. If not there's a problem with the gitea's dumping in general, rather than something specific to your setup.

Basically you've just been bitten by a backup and restore problem, so you should ensure that your backups work now and if not fix it before you need to restore again in future. This is one of the benefits of docker, spinning up duplicate instances should be relatively cheap.

@zeripath commented on GitHub (Oct 28, 2018): I hadn't forgotten about the change to docker, I was just checking whether dumping was working in your new setup. If not there's a problem with the gitea's dumping in general, rather than something specific to your setup. Basically you've just been bitten by a backup and restore problem, so you should ensure that your backups work now and if not fix it before you need to restore again in future. This is one of the benefits of docker, spinning up duplicate instances should be relatively cheap.
Author
Owner

@iddm commented on GitHub (Nov 7, 2018):

It is no longer an issue for me, I have done the work manually - by cloning all the repositories back into new instance with fresh gitea, without importing old dumps, so I probably can't provide any more information on this.

@iddm commented on GitHub (Nov 7, 2018): It is no longer an issue for me, I have done the work manually - by cloning all the repositories back into new instance with fresh gitea, without importing old dumps, so I probably can't provide any more information on this.
Author
Owner

@lafriks commented on GitHub (Nov 7, 2018):

@vityafx can issue be closed then?

@lafriks commented on GitHub (Nov 7, 2018): @vityafx can issue be closed then?
Author
Owner

@iddm commented on GitHub (Nov 7, 2018):

Yes, but only because of that. :)

@iddm commented on GitHub (Nov 7, 2018): Yes, but only because of that. :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#2454