Initial chars in readme misinterpreted as audio file #7872

Closed
opened 2025-11-02 07:39:54 -06:00 by GiteaMirror · 7 comments
Owner

Originally created by @epistemex on GitHub (Sep 20, 2021).

Gitea Version

1.15.2 built with GNU Make 4.1, go1.16.7

Git Version

not relevant

Operating System

Debian 10/x64

How are you running Gitea?

apt install / locally

Database

SQLite

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Description

When uploading a readme.md file starting with ID3 the file is mistaken for an MP3 audio file. Example content

ID3Toy
======

Read and write ID3 tags (v1.0, v1.1, v2.0, v2.3, v2.4), APE (v1, v2) and LYRICS 3 tags (v1, v2) in MP3 files.
...

(MP3 files may start with ID3 tag but shouldn't be used to identify such files to begin with, but that's another issue :) )

So my README.md files shows up as raw file:

2021-09-20_18-36

and clicking on the file line itself shows me the audio player:

2021-09-20_18-36_1

The content of the readme file is correct, verified and is just text (the example above is a direct copy from it). :)

Screenshots

Network activity snapshot

2021-09-20_18-44

Adding a new-line at the top of the file acts as a workaround and Gitea accepts the file properly as text/md file:

2021-09-20_19-47

Originally created by @epistemex on GitHub (Sep 20, 2021). ### Gitea Version 1.15.2 built with GNU Make 4.1, go1.16.7 ### Git Version not relevant ### Operating System Debian 10/x64 ### How are you running Gitea? apt install / locally ### Database SQLite ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Description When uploading a readme.md file starting with ID3 the file is mistaken for an MP3 audio file. Example content ``` ID3Toy ====== Read and write ID3 tags (v1.0, v1.1, v2.0, v2.3, v2.4), APE (v1, v2) and LYRICS 3 tags (v1, v2) in MP3 files. ... ``` (MP3 files *may* start with ID3 tag but shouldn't be used to identify such files to begin with, but that's another issue :) ) So my README.md files shows up as raw file: ![2021-09-20_18-36](https://user-images.githubusercontent.com/70324091/134090294-da1206d5-7389-4fb3-be6f-a31a82562fe3.png) and clicking on the file line itself shows me the audio player: ![2021-09-20_18-36_1](https://user-images.githubusercontent.com/70324091/134090328-464c24d8-df8e-451c-8ef9-77d7b62cdc6c.png) The content of the readme file is correct, verified and is just text (the example above is a direct copy from it). :) ### Screenshots Network activity snapshot ![2021-09-20_18-44](https://user-images.githubusercontent.com/70324091/134091254-576c4766-c0cf-4b7e-950c-3640a4eed9ec.png) Adding a new-line at the top of the file acts as a workaround and Gitea accepts the file properly as text/md file: ![2021-09-20_19-47](https://user-images.githubusercontent.com/70324091/134096017-89045ef7-bc7e-4dc5-b028-5ffa53705f9e.png)
GiteaMirror added the issue/confirmedtype/bug labels 2025-11-02 07:39:54 -06:00
Author
Owner

@wxiaoguang commented on GitHub (Sep 21, 2021):

Just to explain: if the mime type is detected automatically from server side, every file starts with ID3 will be reported as audio file. Maybe some files' mime-types should be detected by other methods.

image
@wxiaoguang commented on GitHub (Sep 21, 2021): Just to explain: if the mime type is detected automatically from server side, every file starts with `ID3` will be reported as `audio` file. Maybe some files' mime-types should be detected by other methods. <img width="675" alt="image" src="https://user-images.githubusercontent.com/2114189/134098496-2945a3ab-cf08-487a-8ed3-77e6e003adf1.png">
Author
Owner

@axifive commented on GitHub (Sep 21, 2021):

it's http.DetectContentType() in renderFile() function

@axifive commented on GitHub (Sep 21, 2021): it's `http.DetectContentType()` in `renderFile()` function
Author
Owner

@wxiaoguang commented on GitHub (Sep 21, 2021):

Yep, typesniffer.DetectContentType and http.DetectContentType work like file command, it only checks some bytes of the file. To fix the "bug", the filenames should also be considered, some well-known filenames should have explicit mime-types, but it might need a refactor. Not sure if it's worthy to do so.

@wxiaoguang commented on GitHub (Sep 21, 2021): Yep, `typesniffer.DetectContentType` and `http.DetectContentType` work like `file` command, it only checks some bytes of the file. To fix the "bug", the filenames should also be considered, some well-known filenames should have explicit mime-types, but it might need a refactor. Not sure if it's worthy to do so.
Author
Owner

@silverwind commented on GitHub (Sep 29, 2021):

Seems like a bug in http.DetectContentType, I would report it to golang.

@silverwind commented on GitHub (Sep 29, 2021): Seems like a bug in `http.DetectContentType`, I would report it to golang.
Author
Owner

@wxiaoguang commented on GitHub (Sep 29, 2021):

To be honest, I do not treat it as a bug. Because the mime auto-detectors just work as this behavior. You can not always correctly guess a file type by its content or extension name (eg: nginx, https://github.com/nginx/nginx/blob/master/conf/mime.types )

@wxiaoguang commented on GitHub (Sep 29, 2021): To be honest, I do not treat it as a bug. Because the mime auto-detectors just work as this behavior. You can not always correctly guess a file type by its content or extension name (eg: nginx, https://github.com/nginx/nginx/blob/master/conf/mime.types )
Author
Owner

@delvh commented on GitHub (Sep 29, 2021):

The problem is:
As @wxiaoguang mentioned, even the Linux command file produces in that circumstance an audio file. For me as well.
So, if both commonly used type detection systems (file and go-http.DetectContentType) behave the same here, I think it is not a bug and instead intended. This particular char sequence does not seem to be a good choice for the start of a file.
And I can somewhat understand why this needs to be done: audio files must be detected somehow. I simply wouldn't have guessed that such an easy string would be used as the identifier.

As remediation, try to prepend a before the leading chars. On my machine, file then shows ASCII text instead of an audio file.

I think that this issue can then be closed here because even if it is surprisingly a bug and not intended, there is nothing Gitea can or should do here.

@delvh commented on GitHub (Sep 29, 2021): The problem is: As @wxiaoguang mentioned, even the Linux command `file` produces in that circumstance an audio file. For me as well. So, if both commonly used type detection systems (`file` and go-`http.DetectContentType`) behave the same here, I think it is not a bug and instead intended. This particular char sequence does not seem to be a good choice for the start of a file. And I can somewhat understand why this needs to be done: audio files must be detected somehow. I simply wouldn't have guessed that such an easy string would be used as the identifier. As remediation, try to prepend a ` ` before the leading chars. On my machine, `file` then shows ASCII text instead of an audio file. I think that this issue can then be closed here because even if it is surprisingly a bug and not intended, there is nothing Gitea can or should do here.
Author
Owner

@wxiaoguang commented on GitHub (Mar 7, 2023):

Will be fixed by: Fix ID3 content detection #23355

@wxiaoguang commented on GitHub (Mar 7, 2023): Will be fixed by: Fix ID3 content detection #23355
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#7872