Filenames for wiki pages with special characters #4072

Closed
opened 2025-11-02 05:36:36 -06:00 by GiteaMirror · 7 comments
Owner

Originally created by @Tekaoh on GitHub (Oct 7, 2019).

Gitea currently saves wiki pages with special characters in the name as files that contain escaped characters in the name. (Example: Title with comma, ampersand & [brackets] = Title-with-comma%2C-ampersand-%26-%5Bbrackets%5D.md)

I propose that Gitea should be saving these pages as filenames with unescaped characters. (Example: Title with comma, ampersand & [brackets] = Title-with-comma,-ampersand-&-[brackets].md)

This would be better because sometimes these wikis are cloned and edited locally. Unescaped filenames are better for that. Also, other services like Github use unescaped characters and so any wikis mirrored from these services come with unescaped characters already. (See #8284 and #8408)

This would present a significant change because all the unit tests currently assume that filenames are saved with escaped characters. Also, backwards compatibility with existing wiki pages would be a concern. However, the unit tests could be adjusted and the solution presented in #8408 would solve the compatibility problem (with the logic reversed to satisfy the new unit tests).

For now, I'm hoping for feedback and discussion about this proposal. I'm also willing to prepare a pull request to make this change.

Originally created by @Tekaoh on GitHub (Oct 7, 2019). Gitea currently saves wiki pages with special characters in the name as files that contain escaped characters in the name. (Example: `Title with comma, ampersand & [brackets]` = `Title-with-comma%2C-ampersand-%26-%5Bbrackets%5D.md`) I propose that Gitea should be saving these pages as filenames with _unescaped_ characters. (Example: `Title with comma, ampersand & [brackets]` = `Title-with-comma,-ampersand-&-[brackets].md`) This would be better because sometimes these wikis are cloned and edited locally. Unescaped filenames are better for that. Also, other services like Github use unescaped characters and so any wikis mirrored from these services come with unescaped characters already. (See #8284 and #8408) This would present a significant change because all the unit tests currently assume that filenames are saved with escaped characters. Also, backwards compatibility with existing wiki pages would be a concern. However, the unit tests could be adjusted and the solution presented in #8408 would solve the compatibility problem (with the logic reversed to satisfy the new unit tests). For now, I'm hoping for feedback and discussion about this proposal. I'm also willing to prepare a pull request to make this change.
GiteaMirror added the issue/staletype/bug labels 2025-11-02 05:36:36 -06:00
Author
Owner

@bagasme commented on GitHub (Oct 8, 2019):

At least NTFS (on Windows) allows filenames with ,, &, and [ ], not sure about FAT file system.

However, consider the case when NTFS also forbids <, >, :, ", /, \ | ? and *. If any of those forbidden characters are present on wiki page file name, Gitea should throw Illegal characters on file name error (only when Gitea is installed in Windows).

@bagasme commented on GitHub (Oct 8, 2019): At least NTFS (on Windows) allows filenames with `,`, `&`, and `[` `]`, not sure about FAT file system. However, consider the case when NTFS also forbids `<`, `>`, `:`, `"`, `/`, `\` `|` `?` and `*`. If any of those forbidden characters are present on wiki page file name, Gitea should throw `Illegal characters on file name` error (only when Gitea is installed in Windows).
Author
Owner

@Tekaoh commented on GitHub (Oct 8, 2019):

That's an interesting point. I wonder how Gitea handles cases of files in regular repositories with these illegal characters in the filenames. They could potentially be pushed from Linux machines or created in the web browser. Only filenames in wiki repositories seem to be escaped currently.

Also, I wonder what would happen if you tried on Windows to clone a wiki repo from Github that has pages with these characters in the titles since Github uses unescaped characters in wiki filenames. Although that's a curiosity of Git's behavior, not Gitea's.

@Tekaoh commented on GitHub (Oct 8, 2019): That's an interesting point. I wonder how Gitea handles cases of files in regular repositories with these illegal characters in the filenames. They could potentially be pushed from Linux machines or created in the web browser. Only filenames in wiki repositories seem to be escaped currently. Also, I wonder what would happen if you tried on Windows to clone a wiki repo from Github that has pages with these characters in the titles since Github uses unescaped characters in wiki filenames. Although that's a curiosity of Git's behavior, not Gitea's.
Author
Owner

@guillep2k commented on GitHub (Oct 8, 2019):

Consider repositories where you have:

MyFile.txt
myfile.txt
myFile.TXT
etc.
@guillep2k commented on GitHub (Oct 8, 2019): Consider repositories where you have: ``` MyFile.txt myfile.txt myFile.TXT etc. ```
Author
Owner

@Tekaoh commented on GitHub (Oct 8, 2019):

Those don't have special characters though, so escaping them is irrelevant. Since you're allowed to have those filenames on Linux, I wonder what would happen if you create them on Linux and push to Gitea running on Windows.

Gitea doesn't actually store the files, just Git trees. So those files in a repo wouldn't necessarily exist in the Windows filesystem even if Gitea is installed on Windows. That should actually be true of filenames with special characters that NTFS doesn't like as well, I think...

@Tekaoh commented on GitHub (Oct 8, 2019): Those don't have special characters though, so escaping them is irrelevant. Since you're allowed to have those filenames on Linux, I wonder what would happen if you create them on Linux and push to Gitea running on Windows. Gitea doesn't actually store the files, just Git trees. So those files in a repo wouldn't necessarily exist in the Windows filesystem even if Gitea is installed on Windows. That should actually be true of filenames with special characters that NTFS doesn't like as well, I think...
Author
Owner

@bagasme commented on GitHub (Oct 9, 2019):

@Tekaoh And Git stores any repo data (including the wiki repo) as blob object with SHA-1 hash as blob's name. If you interested see explanations from Pro Git book.

Since the hash just contain 0-9 and a-f, the object can be stored anywhere, even on Windows (NTFS).

@bagasme commented on GitHub (Oct 9, 2019): @Tekaoh And Git stores any repo data (including the wiki repo) as blob object with SHA-1 hash as blob's name. If you interested see [explanations from Pro Git book](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects). Since the hash just contain 0-9 and a-f, the object can be stored anywhere, even on Windows (NTFS).
Author
Owner

@stale[bot] commented on GitHub (Dec 8, 2019):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

@stale[bot] commented on GitHub (Dec 8, 2019): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.
Author
Owner

@stale[bot] commented on GitHub (Dec 22, 2019):

This issue has been automatically closed because of inactivity. You can re-open it if needed.

@stale[bot] commented on GitHub (Dec 22, 2019): This issue has been automatically closed because of inactivity. You can re-open it if needed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#4072