linkRegex broken #2951

Closed
opened 2025-11-02 04:55:06 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @mrsdizzie on GitHub (Feb 21, 2019).

Function here:
https://github.com/go-gitea/gitea/blob/master/modules/markup/html.go#L69

// matches http/https links. used for autlinking those. partly modified from
	// the original present in autolink.js
	linkRegex = regexp.MustCompile(`(?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?:(?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+:=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?`)
)

However, that regex doesn't just match https? links, it will match anything starting with www. So if somebody writes www.google.com it will generate the link which in most browsers is seen as relative and generates the link as https://try.gitea.io/mrsdizzie/test/issues/1/www.google.com

This regex is a bit wild but it seems the problem is that the first bit:

(?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+

Will match http://www but then there is an OR | which will match just www without the https?

|(?:www\.

Removing the 'or www' bit seems to fix it in some simple testing:

(?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+(?:\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?:(?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+:=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?

And testing here:
https://regex101.com/r/HZsurh/1

Judging by the preview of this issue github messes this up pretty bad as well. I don't think it is necessarily wrong to match something written as www.example.com and want to turn it into a link, it should just make sure that it puts a http(s):// scheme on it when creating a link if not already present (or not do it).

Originally created by @mrsdizzie on GitHub (Feb 21, 2019). - Gitea version (or commit ref): 1.7.2 - Git version: 2.1.4 - Operating system: Debian/Macos - Database (use `[x]`): - [ ] PostgreSQL - [ ] MySQL - [ ] MSSQL - [ x] SQLite - Can you reproduce the bug at https://try.gitea.io: - [ x] Yes (https://try.gitea.io/mrsdizzie/test/issues/1) - [ ] No - [ ] Not relevant Function here: https://github.com/go-gitea/gitea/blob/master/modules/markup/html.go#L69 ``` // matches http/https links. used for autlinking those. partly modified from // the original present in autolink.js linkRegex = regexp.MustCompile(`(?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?:(?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+:=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?`) ) ``` However, that regex doesn't just match https? links, it will match anything starting with www. So if somebody writes www.google.com it will generate the link <a href="www.google.com" > which in most browsers is seen as relative and generates the link as https://try.gitea.io/mrsdizzie/test/issues/1/www.google.com This regex is a bit wild but it seems the problem is that the first bit: ``` (?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+ ``` Will match http://www but then there is an OR | which will match just www without the https? ``` |(?:www\. ``` Removing the 'or www' bit seems to fix it in some simple testing: ``` (?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+(?:\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?:(?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+:=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))? ``` And testing here: https://regex101.com/r/HZsurh/1 Judging by the preview of this issue github messes this up pretty bad as well. I don't think it is necessarily wrong to match something written as www.example.com and want to turn it into a link, it should just make sure that it puts a http(s):// scheme on it when creating a link if not already present (or not do it).
GiteaMirror added the type/enhancement label 2025-11-02 04:55:06 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#2951