Use accentless names for searching #10632

Open
opened 2025-11-02 09:13:34 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @drsybren on GitHub (Apr 11, 2023).

Feature Description

When searching for users, for example to assign an issue or PR to someone, the name has to be typed as-is in order to be found. In my case, when searching for "Stuvel" you will not find me, as you have to search for "Stüvel" exactly.

A solution for this would be to use a project like go-unidecode to convert names to ASCII representations. Those could then be used as an additional field to search for, with the search query itself undergoing a similar treatment.

Screenshots

image

Originally created by @drsybren on GitHub (Apr 11, 2023). ### Feature Description When searching for users, for example to assign an issue or PR to someone, the name has to be typed as-is in order to be found. In my case, when searching for "Stuvel" you will not find me, as you have to search for "Stüvel" exactly. A solution for this would be to use a project like [go-unidecode](https://github.com/mozillazg/go-unidecode) to convert names to ASCII representations. Those could then be used as an additional field to search for, with the search query itself undergoing a similar treatment. ### Screenshots ![image](https://user-images.githubusercontent.com/122987084/231173904-14173781-1c98-496a-99a7-ba178a3ddd85.png)
GiteaMirror added the type/proposaltype/feature labels 2025-11-02 09:13:34 -06:00
Author
Owner

@delvh commented on GitHub (Apr 11, 2023):

Sounds interesting.
However, I'm not sure how this will work with the existing indices.
I guess they need to be re-indexed.
Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese)
The library you mentioned doesn't take a locale in its API, so it always outputs the same string.
This is only a problem if we tried to remove this dependency, apart from that it seems as if it would work.

@delvh commented on GitHub (Apr 11, 2023): Sounds interesting. However, I'm not sure how this will work with the existing indices. I guess they need to be re-indexed. Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese) The library you mentioned doesn't take a locale in its API, so it always outputs the same string. This is only a problem if we tried to remove this dependency, apart from that it seems as if it would work.
Author
Owner

@lunny commented on GitHub (Apr 12, 2023):

If database support that convertion, it will be easier to implementation.

@lunny commented on GitHub (Apr 12, 2023): If database support that convertion, it will be easier to implementation.
Author
Owner

@drsybren commented on GitHub (Apr 14, 2023):

Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese) The library you mentioned doesn't take a locale in its API, so it always outputs the same string.

That's true, although I don't really see this as an issue. As in, with the proposed approach, some more matches are added to the search operation, and nothing is removed. This means that searching for a name in Japanese or Chinese will keep working as it works now.

If database support that convertion, it will be easier to implementation.

Many (if not all) databases do support some form of unicode normalisation, so that different representations of the same character are mapped to a single one (i.e. ü can be encoded as U+00FC LATIN SMALL LETTER U WITH DIAERESIS or as U+0075 LATIN SMALL LETTER U followed by U+0308 COMBINING DIAERESIS). This is wise to do anyway, regardless of this proposal.

@drsybren commented on GitHub (Apr 14, 2023): > Another thing I've noticed: As far as I know, not every char has the same pronunciation (depending on the language, i.e. Japanese vs. Chinese) The library you mentioned doesn't take a locale in its API, so it always outputs the same string. That's true, although I don't really see this as an issue. As in, with the proposed approach, some more matches are added to the search operation, and nothing is removed. This means that searching for a name in Japanese or Chinese will keep working as it works now. > If database support that convertion, it will be easier to implementation. Many (if not all) databases do support some form of unicode normalisation, so that different representations of the same character are mapped to a single one (i.e. `ü` can be encoded as `U+00FC LATIN SMALL LETTER U WITH DIAERESIS` or as `U+0075 LATIN SMALL LETTER U` followed by `U+0308 COMBINING DIAERESIS`). This is wise to do anyway, regardless of this proposal.
Author
Owner

@techknowlogick commented on GitHub (Apr 14, 2023):

We currently have a "lower_name" column in the database for lowercasing usernames, I wonder if we could have a similar column for "normalized" full names (apologies for terminology, it really others those who have names with accents, and would love to have an alternative naming for this).

@techknowlogick commented on GitHub (Apr 14, 2023): We currently have a "lower_name" column in the database for lowercasing usernames, I wonder if we could have a similar column for "normalized" full names (apologies for terminology, it really others those who have names with accents, and would love to have an alternative naming for this).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#10632