[Proposal] Use ICU message for i18n & l10n #10580

Closed
opened 2025-11-02 09:11:49 -06:00 by GiteaMirror · 7 comments
Owner

Originally created by @wxiaoguang on GitHub (Apr 1, 2023).

To avoid re-inventing wheels, it's better to use ICU message to do i18n/l10n.

Steps:

  1. Fix the buggy ini package
  2. Clean up all translation strings
  3. Introduce ICU message parser
  4. Convert legacy plural-related strings to ICU format
  5. Translate on Crowdin https://support.crowdin.com/icu-message-syntax/

Below is outdated description: the old idea is using a customized message format (it's a simple syntax like ICU message, but it's not supported by Crowdin, so Crowdin can't help to check mistakes).

The official package's design seems clear and will resolve Gitea's i18n/l10n problems fundamentally.

https://pkg.go.dev/golang.org/x/text/message

https://pkg.go.dev/golang.org/x/text/feature/plural

https://github.com/unicode-org/cldr/blob/main/common/supplemental/ordinals.xml

https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml

I think a translator-friendly syntax is very important, because there are really a lot of broken translations, if we make the system more complex, there will be more errors.

And the syntax should be also designed for frontend (JS/Vue).

As the first step, we should refactor the locale package to make it stable, see the problems

A brief idea about how to maintain the translation strings:

<!-- 1: other -->  {%d $[text]}

<!-- 2: one,other --> {%d $[text,texts]}

<!-- 3: zero,one,other --> {%d $zero[0,1,o]}
<!-- 3: one,two,other --> {%d $two[1,2,o]}
<!-- 3: one,few,other --> {%d $few[1,f,o]}
<!-- 3: one,many,other --> {%d $many[1,m,o]}

<!-- 4: one,two,few,other --> {%d $two-few[1,2,f,o]}
<!-- 4: one,two,many,other --> {%d $two-many[1,2,m,o]}
<!-- 4: one,few,many,other --> {%d $few-many[1,f,m,o]}

<!-- 5: one,two,few,many,other --> {%d $[1,2,f,m,o]}

<!-- 6: zero,one,two,few,many,other --> {%d $[0,1,2,f,m,o]}

Then use the syntax to support different languages:

en: msg = there are {%d $[pull request, pull requests]}
lv: msg = there are {%d $zero[for 0 pull request, pull request, pull requests]}
ar: msg = there are {%d $[for 0, for 1, for 2, few, many, other]}

Another possible approach, define all concepts ahead:

en: NumPR = {%d $[pull request, pull requests]}
lv: NumPR = {%d $zero[for 0 pull request, pull request, pull requests]}
ar: NumPR = {%d $[for 0, for 1, for 2, few, many, other]}

Then the NumPR could be reused:

en: msg = there are {$NumPR}
lv: msg = there are {$NumPR}
ar: msg = there are {$NumPR}

If we only need to support one %d, the syntax might be simplified, eg:

en: msg = there are %d $[pull request, pull requests]
lv: msg = there are %d $zero[for 0 pull request, pull request, pull requests]
ar: msg = there are %d $[for 0, for 1, for 2, few, many, other]
Originally created by @wxiaoguang on GitHub (Apr 1, 2023). To avoid re-inventing wheels, it's better to use ICU message to do i18n/l10n. Steps: 1. Fix the buggy ini package 2. Clean up all translation strings 3. Introduce ICU message parser 4. Convert legacy plural-related strings to ICU format 5. Translate on Crowdin https://support.crowdin.com/icu-message-syntax/ ---- Below is outdated description: the old idea is using a customized message format (it's a simple syntax like ICU message, but it's not supported by Crowdin, so Crowdin can't help to check mistakes). <details> The official package's design seems clear and will resolve Gitea's i18n/l10n problems fundamentally. https://pkg.go.dev/golang.org/x/text/message https://pkg.go.dev/golang.org/x/text/feature/plural https://github.com/unicode-org/cldr/blob/main/common/supplemental/ordinals.xml https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml I think a translator-friendly syntax is very important, because there are really a lot of broken translations, if we make the system more complex, there will be more errors. And the syntax should be also designed for frontend (JS/Vue). As the first step, we should refactor the locale package to make it stable, [see the problems](https://github.com/go-gitea/gitea/blob/main/modules/translation/i18n/i18n_test.go) A brief idea about how to maintain the translation strings: ``` <!-- 1: other --> {%d $[text]} <!-- 2: one,other --> {%d $[text,texts]} <!-- 3: zero,one,other --> {%d $zero[0,1,o]} <!-- 3: one,two,other --> {%d $two[1,2,o]} <!-- 3: one,few,other --> {%d $few[1,f,o]} <!-- 3: one,many,other --> {%d $many[1,m,o]} <!-- 4: one,two,few,other --> {%d $two-few[1,2,f,o]} <!-- 4: one,two,many,other --> {%d $two-many[1,2,m,o]} <!-- 4: one,few,many,other --> {%d $few-many[1,f,m,o]} <!-- 5: one,two,few,many,other --> {%d $[1,2,f,m,o]} <!-- 6: zero,one,two,few,many,other --> {%d $[0,1,2,f,m,o]} Then use the syntax to support different languages: en: msg = there are {%d $[pull request, pull requests]} lv: msg = there are {%d $zero[for 0 pull request, pull request, pull requests]} ar: msg = there are {%d $[for 0, for 1, for 2, few, many, other]} ``` Another possible approach, define all concepts ahead: ``` en: NumPR = {%d $[pull request, pull requests]} lv: NumPR = {%d $zero[for 0 pull request, pull request, pull requests]} ar: NumPR = {%d $[for 0, for 1, for 2, few, many, other]} Then the NumPR could be reused: en: msg = there are {$NumPR} lv: msg = there are {$NumPR} ar: msg = there are {$NumPR} ``` ---- If we only need to support one `%d`, the syntax might be simplified, eg: ``` en: msg = there are %d $[pull request, pull requests] lv: msg = there are %d $zero[for 0 pull request, pull request, pull requests] ar: msg = there are %d $[for 0, for 1, for 2, few, many, other] ``` </details>
GiteaMirror added the type/proposaltype/featuremodifies/translation labels 2025-11-02 09:11:49 -06:00
Author
Owner

@lunny commented on GitHub (Apr 28, 2023):

Are there any tool to convert ini format to that ICU format? Or should we create one?

@lunny commented on GitHub (Apr 28, 2023): Are there any tool to convert ini format to that ICU format? Or should we create one?
Author
Owner

@wxiaoguang commented on GitHub (Apr 28, 2023):

I didn't get your mean.

ICU is a just message format, no need to convert

@wxiaoguang commented on GitHub (Apr 28, 2023): I didn't get your mean. ICU is a just message format, no need to convert
Author
Owner

@lunny commented on GitHub (Apr 28, 2023):

Maybe we should use another format but ini files?

@lunny commented on GitHub (Apr 28, 2023): Maybe we should use another format but ini files?
Author
Owner

@wxiaoguang commented on GitHub (Apr 28, 2023):

Why?

@wxiaoguang commented on GitHub (Apr 28, 2023): Why?
Author
Owner

@silverwind commented on GitHub (Apr 28, 2023):

YAML may be ok as it requires less escaping than INI. But one also needs to be aware of it's pitfalls, like no becoming boolean false because it is a typed language which ini isn't.

@silverwind commented on GitHub (Apr 28, 2023): YAML may be ok as it requires less escaping than INI. But one also needs to be aware of it's pitfalls, like `no` becoming boolean `false` because it is a typed language which ini isn't.
Author
Owner

@wxiaoguang commented on GitHub (Apr 28, 2023):

At the moment I don't see real benefit that YAML would bring.

Actually we do not need too much "escaping" with INI, there are just some legacy bugs.

The only "escaping" requirements are:

  1. The comment , YAML still needs to escape / quote #
  2. The leading/trailing space: YAML still needs to quote it by "
  3. Multiple-line support: YAML's syntax is not as simple as INI

I think INI still wins.

@wxiaoguang commented on GitHub (Apr 28, 2023): At the moment I don't see real benefit that YAML would bring. Actually we do not need too much "escaping" with INI, there are just some legacy bugs. The only "escaping" requirements are: 1. The comment , YAML still needs to escape / quote `#` 2. The leading/trailing space: YAML still needs to quote it by `"` 3. Multiple-line support: YAML's syntax is not as simple as INI I think INI still wins.
Author
Owner

@silverwind commented on GitHub (Jun 2, 2023):

Found another use case where {placeholder} syntax would have been really useful:

https://github.com/go-gitea/gitea/pull/25050/files#r1214691116

@silverwind commented on GitHub (Jun 2, 2023): Found another use case where `{placeholder}` syntax would have been really useful: https://github.com/go-gitea/gitea/pull/25050/files#r1214691116
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#10580