Cyrillic (and probably other non-ASCII scripts) makes wikis impossible to properly clone #11500

Open
opened 2025-11-02 09:39:29 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @OctoNezd on GitHub (Aug 21, 2023).

Description

If you use Cyrillic in title of the page, gitea url-encodes it, which results in very long filename in underlying git repository for wiki which are unusable for most file systems.

Sample wiki repo on try.gitea.io which causes issue: https://try.gitea.io/octonezd/testrepo/wiki/?action=_pages.

If you try and clone it, the following error will occur:

git clone https://try.gitea.io/octonezd/testrepo.wiki.git
Cloning into 'testrepo.wiki'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 7 (delta 2), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
Resolving deltas: 100% (2/2), done.
error: cannot stat '%D0%A3%D1%81%D1%82%D0%B0%D0%BD%D0%BE%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%BD%D0%BE%D0%B3%D0%BE-%D0%BE%D0%B1%D0%B5%D1%81%D0%BF%D0%B5%D1%87%D0%B5%D0%BD%D0%B8%D1%8F-%D0%B4%D0%BB%D1%8F-%D0%BF%D1%80%D0%BE%D0%B2%D0%B5%D1%80%D0%BA%D0%B8-%D1%80%D0%B0%D0%B1%D0%BE%D1%82%D1%8B-%D1%81%D0%B5%D1%80%D0%B2%D0%B8%D1%81%D0%BE%D0%B2-%D0%B8-%D0%BF%D1%80%D0%B8%D0%BB%D0%BE%D0%B6%D0%B5%D0%BD%D0%B8%D0%B9.md': File name too long
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

The underlying issue seems to be in usage of QueryEscape in f6e7798405/services/wiki/wiki_path.go (L93)

Attached log is made on fresh and empty gitea instance which has only one repo with gitea wiki with log configuration set to debug and RUN_MODE=dev.

Gitea Version

1.20.2

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

https://gist.github.com/OctoNezd/d5454c615fbe377d6a9bbb30418fcd03

Screenshots

No response

Git Version

No response

Operating System

Arch Linux

How are you running Gitea?

Docker

Database

SQLite

Originally created by @OctoNezd on GitHub (Aug 21, 2023). ### Description If you use Cyrillic in title of the page, gitea url-encodes it, which results in very long filename in underlying git repository for wiki which are unusable for most file systems. Sample wiki repo on try.gitea.io which causes issue: https://try.gitea.io/octonezd/testrepo/wiki/?action=_pages. If you try and clone it, the following error will occur: ```sh git clone https://try.gitea.io/octonezd/testrepo.wiki.git Cloning into 'testrepo.wiki'... remote: Enumerating objects: 7, done. remote: Counting objects: 100% (7/7), done. remote: Compressing objects: 100% (6/6), done. remote: Total 7 (delta 2), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (7/7), done. Resolving deltas: 100% (2/2), done. error: cannot stat '%D0%A3%D1%81%D1%82%D0%B0%D0%BD%D0%BE%D0%B2%D0%BA%D0%B0-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%BD%D0%BE%D0%B3%D0%BE-%D0%BE%D0%B1%D0%B5%D1%81%D0%BF%D0%B5%D1%87%D0%B5%D0%BD%D0%B8%D1%8F-%D0%B4%D0%BB%D1%8F-%D0%BF%D1%80%D0%BE%D0%B2%D0%B5%D1%80%D0%BA%D0%B8-%D1%80%D0%B0%D0%B1%D0%BE%D1%82%D1%8B-%D1%81%D0%B5%D1%80%D0%B2%D0%B8%D1%81%D0%BE%D0%B2-%D0%B8-%D0%BF%D1%80%D0%B8%D0%BB%D0%BE%D0%B6%D0%B5%D0%BD%D0%B8%D0%B9.md': File name too long fatal: unable to checkout working tree warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/' ``` The underlying issue seems to be in usage of QueryEscape in https://github.com/go-gitea/gitea/blob/f6e7798405ef9eb7b2936b112fd2a4ea1bab4082/services/wiki/wiki_path.go#L93 Attached log is made on fresh and empty gitea instance which has only one repo with gitea wiki with log configuration set to debug and RUN_MODE=dev. ### Gitea Version 1.20.2 ### Can you reproduce the bug on the Gitea demo site? Yes ### Log Gist https://gist.github.com/OctoNezd/d5454c615fbe377d6a9bbb30418fcd03 ### Screenshots _No response_ ### Git Version _No response_ ### Operating System Arch Linux ### How are you running Gitea? Docker ### Database SQLite
GiteaMirror added the type/bug label 2025-11-02 09:39:29 -06:00
Author
Owner

@wxiaoguang commented on GitHub (Aug 21, 2023):

That's unavoidable (unfixable) at the moment, because the Wiki escapes the filename to URL encoding and store it in the git repo.

So, for non-ASCII chars, the bytes become 3 or 4 times longer (%AA%BB%CC) . For the filesystems which can only handle 255 bytes, the title can only contain at most 16~21 non-ASCII chars.

@wxiaoguang commented on GitHub (Aug 21, 2023): That's unavoidable (unfixable) at the moment, because the Wiki escapes the filename to URL encoding and store it in the git repo. So, for non-ASCII chars, the bytes become 3 or 4 times longer (%AA%BB%CC) . For the filesystems which can only handle 255 bytes, the title can only contain at most 16~21 non-ASCII chars.
Author
Owner

@OctoNezd commented on GitHub (Aug 21, 2023):

I understand replacing spaces with dash for easier usage in terminal, but why even store url-escaped data in git in first place?

@OctoNezd commented on GitHub (Aug 21, 2023): I understand replacing spaces with dash for easier usage in terminal, but why even store url-escaped data in git in first place?
Author
Owner

@wxiaoguang commented on GitHub (Aug 21, 2023):

I understand replacing spaces with dash for easier usage in terminal, but why even store url-escaped data in git in first place?

IIRC that's the old behavior? I am not the wiki's designer ......

@wxiaoguang commented on GitHub (Aug 21, 2023): > I understand replacing spaces with dash for easier usage in terminal, but why even store url-escaped data in git in first place? IIRC that's the old behavior? I am not the wiki's designer ......
Author
Owner

@OctoNezd commented on GitHub (Aug 21, 2023):

unavoidable (unfixable) at the moment

does this mean issue is wontfix?

@OctoNezd commented on GitHub (Aug 21, 2023): > unavoidable (unfixable) at the moment does this mean issue is wontfix?
Author
Owner

@wxiaoguang commented on GitHub (Aug 21, 2023):

unavoidable (unfixable) at the moment

does this mean issue is wontfix?

Unless there is someone who would spend time on it (and break the existing behavior).

@wxiaoguang commented on GitHub (Aug 21, 2023): > > unavoidable (unfixable) at the moment > > does this mean issue is wontfix? Unless there is someone who would spend time on it (and break the existing behavior).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#11500