Use accentless names for searching #10632

New Issue

GiteaMirror · 2025-11-02T09:13:34-06:00

GiteaMirror commented

2025-11-02 09:13:34 -06:00

Originally created by @drsybren on GitHub (Apr 11, 2023).

Feature Description

When searching for users, for example to assign an issue or PR to someone, the name has to be typed as-is in order to be found. In my case, when searching for "Stuvel" you will not find me, as you have to search for "Stüvel" exactly.

A solution for this would be to use a project like go-unidecode to convert names to ASCII representations. Those could then be used as an additional field to search for, with the search query itself undergoing a similar treatment.

Screenshots

Originally created by @drsybren on GitHub (Apr 11, 2023). ### Feature Description When searching for users, for example to assign an issue or PR to someone, the name has to be typed as-is in order to be found. In my case, when searching for "Stuvel" you will not find me, as you have to search for "Stüvel" exactly. A solution for this would be to use a project like [go-unidecode](https://github.com/mozillazg/go-unidecode) to convert names to ASCII representations. Those could then be used as an additional field to search for, with the search query itself undergoing a similar treatment. ### Screenshots ![image](https://user-images.githubusercontent.com/122987084/231173904-14173781-1c98-496a-99a7-ba178a3ddd85.png)

GiteaMirror added the type/proposal type/feature labels 2025-11-02 09:13:34 -06:00

GiteaMirror commented

2025-11-02 09:13:35 -06:00

@delvh commented on GitHub (Apr 11, 2023):

Sounds interesting.
However, I'm not sure how this will work with the existing indices.
I guess they need to be re-indexed.
Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese)
The library you mentioned doesn't take a locale in its API, so it always outputs the same string.
This is only a problem if we tried to remove this dependency, apart from that it seems as if it would work.

@delvh commented on GitHub (Apr 11, 2023): Sounds interesting. However, I'm not sure how this will work with the existing indices. I guess they need to be re-indexed. Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese) The library you mentioned doesn't take a locale in its API, so it always outputs the same string. This is only a problem if we tried to remove this dependency, apart from that it seems as if it would work.

GiteaMirror commented

2025-11-02 09:13:35 -06:00

@lunny commented on GitHub (Apr 12, 2023):

If database support that convertion, it will be easier to implementation.

@lunny commented on GitHub (Apr 12, 2023): If database support that convertion, it will be easier to implementation.

GiteaMirror commented

2025-11-02 09:13:35 -06:00

@drsybren commented on GitHub (Apr 14, 2023):

Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese) The library you mentioned doesn't take a locale in its API, so it always outputs the same string.

That's true, although I don't really see this as an issue. As in, with the proposed approach, some more matches are added to the search operation, and nothing is removed. This means that searching for a name in Japanese or Chinese will keep working as it works now.

If database support that convertion, it will be easier to implementation.

Many (if not all) databases do support some form of unicode normalisation, so that different representations of the same character are mapped to a single one (i.e. ü can be encoded as U+00FC LATIN SMALL LETTER U WITH DIAERESIS or as U+0075 LATIN SMALL LETTER U followed by U+0308 COMBINING DIAERESIS). This is wise to do anyway, regardless of this proposal.

@drsybren commented on GitHub (Apr 14, 2023): > Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese) The library you mentioned doesn't take a locale in its API, so it always outputs the same string. That's true, although I don't really see this as an issue. As in, with the proposed approach, some more matches are added to the search operation, and nothing is removed. This means that searching for a name in Japanese or Chinese will keep working as it works now. > If database support that convertion, it will be easier to implementation. Many (if not all) databases do support some form of unicode normalisation, so that different representations of the same character are mapped to a single one (i.e. `ü` can be encoded as `U+00FC LATIN SMALL LETTER U WITH DIAERESIS` or as `U+0075 LATIN SMALL LETTER U` followed by `U+0308 COMBINING DIAERESIS`). This is wise to do anyway, regardless of this proposal.

GiteaMirror commented

2025-11-02 09:13:36 -06:00

@techknowlogick commented on GitHub (Apr 14, 2023):

We currently have a "lower_name" column in the database for lowercasing usernames, I wonder if we could have a similar column for "normalized" full names (apologies for terminology, it really others those who have names with accents, and would love to have an alternative naming for this).

@techknowlogick commented on GitHub (Apr 14, 2023): We currently have a "lower_name" column in the database for lowercasing usernames, I wonder if we could have a similar column for "normalized" full names (apologies for terminology, it really others those who have names with accents, and would love to have an alternative naming for this).

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/gitea#10632