anchors don't work when contains punctuation marks just like or ( #11415

Closed
opened 2025-11-02 09:37:00 -06:00 by GiteaMirror · 14 comments
Owner

Originally created by @lazyky on GitHub (Aug 7, 2023).

Description

Markdown Heading ID contains Unicode is inconsistent with Github.
For #### test(1) in gitea , the id is "user-content-test-1" and in github it is "user-content-test1"

The markdown below is available for jumping on github, but not for gitea.

#### test(1)
to [test(1)](#test1)

gitea

id = "user-content-test-1"
8d42bfb9f7ea6f43e4f3b6b8f339aff

github

id = "user-content-test1"
89b1f42dcd9174ec189487dd55057a4

Gitea Version

1.20.2

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

I was able to reproduce it using try.gitea.io.

Database

None

Originally created by @lazyky on GitHub (Aug 7, 2023). ### Description Markdown Heading ID contains Unicode `(` is inconsistent with Github. For `#### test(1)` in gitea , the id is `"user-content-test-1"` and in github it is `"user-content-test1"` The markdown below is available for jumping on github, but not for gitea. ```md #### test(1) to [test(1)](#test1) ``` #### gitea `id = "user-content-test-1"` ![8d42bfb9f7ea6f43e4f3b6b8f339aff](https://github.com/go-gitea/gitea/assets/24838473/69713732-1666-4640-85f7-504f7d48aae5) #### github `id = "user-content-test1"` ![89b1f42dcd9174ec189487dd55057a4](https://github.com/go-gitea/gitea/assets/24838473/30229ec7-5da0-48d6-b264-9ba2a5f709e5) ### Gitea Version 1.20.2 ### Can you reproduce the bug on the Gitea demo site? Yes ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version _No response_ ### Operating System _No response_ ### How are you running Gitea? I was able to reproduce it using try.gitea.io. ### Database None
GiteaMirror added the type/bug label 2025-11-02 09:37:00 -06:00
Author
Owner

@CaiCandong commented on GitHub (Aug 8, 2023):

What's the impact of this problem?

@CaiCandong commented on GitHub (Aug 8, 2023): What's the impact of this problem?
Author
Owner

@bioinformatist commented on GitHub (Aug 8, 2023):

@CaiCandong

Sometimes we need to use section titles like this:

image

However, the malfunction of the anchors pointing to them makes reading somewhat difficult.

@bioinformatist commented on GitHub (Aug 8, 2023): @CaiCandong Sometimes we need to use section titles like this: ![image](https://github.com/go-gitea/gitea/assets/17194719/e9133ef7-ec16-45cd-b3a9-2e9c72524e1a) However, the malfunction of the anchors pointing to them makes reading somewhat difficult.
Author
Owner

@CaiCandong commented on GitHub (Aug 8, 2023):

Markdown Heading ID contains Unicode is inconsistent with Github.

Thanks for the report, I understand the problem, besides does this problem also occur when using ( directly?

@CaiCandong commented on GitHub (Aug 8, 2023): > Markdown Heading ID contains Unicode `(` is inconsistent with Github. Thanks for the report, I understand the problem, besides `(` does this problem also occur when using `(` directly?
Author
Owner

@lazyky commented on GitHub (Aug 8, 2023):

Markdown Heading ID contains Unicode is inconsistent with Github.

Thanks for the report, I understand the problem, besides does this problem also occur when using ( directly?

Yes. I also test (, !, :, *, and . They are same as . @CaiCandong

#### test(0)

#### test!1

#### test:2

#### test*3

#### test!4

#### test:5

gitea

image

github

image

@lazyky commented on GitHub (Aug 8, 2023): > > Markdown Heading ID contains Unicode `(` is inconsistent with Github. > > Thanks for the report, I understand the problem, besides `(` does this problem also occur when using `(` directly? Yes. I also test `(`, `!`, `:`, `*`, `:` and `!`. They are same as `(` . @CaiCandong ```md #### test(0) #### test!1 #### test:2 #### test*3 #### test!4 #### test:5 ``` #### gitea ![image](https://github.com/go-gitea/gitea/assets/24838473/20581a7a-22d5-4c93-8041-ab2883169d60) #### github ![image](https://github.com/go-gitea/gitea/assets/24838473/664d24d4-5024-4ef7-895f-eefa84eb2d5f)
Author
Owner

@CaiCandong commented on GitHub (Aug 8, 2023):

I've located the code for this problem, it has to do with the user-conent-* generation rules, but I'm not particularly sure how github handles this, can you give me some more examples to help me refine the code?

#### test:ad # df
#### test:ad # df
#### test:ad #23 df 2*/*
@CaiCandong commented on GitHub (Aug 8, 2023): I've located the code for this problem, it has to do with the `user-conent-*` generation rules, but I'm not particularly sure how github handles this, can you give me some more examples to help me refine the code? ``` #### test:ad # df #### test:ad # df #### test:ad #23 df 2*/* ```
Author
Owner

@lazyky commented on GitHub (Aug 8, 2023):

test:ad # df

test:ad # df

test:ad #23 df 2*/*

github

There are the examples on github
image

@lazyky commented on GitHub (Aug 8, 2023): > #### test:ad # df > #### test:ad # df > #### test:ad #23 df 2*/* #### github There are the examples on github ![image](https://github.com/go-gitea/gitea/assets/24838473/2dbe9e13-1942-4a10-acd9-5ee96fd4e258)
Author
Owner

@CaiCandong commented on GitHub (Aug 8, 2023):

def cheanValue(anchor_name):
    anchor_name = anchor_name.strip()
    ret = []
    for c in anchor_name:
        if c.isalpha() or c.isdigit() or c == '_' or c == '-':
            ret.append(c.lower())
        if c == ' ':
            ret.append('-')
    return ''.join(ret)

def test():
    cases = [
        ["", ""],
        ["test(0)", "test0"],
        ["test!1", "test1"],
        ["test:2", "test2"],
        ["test*3", "test3"],
        ["test!4", "test4"],
        ["test:5", "test5"],
        ["test*6", "test6"],
        ["test:6 a", "test6-a"],
        ["test:6 !b", "test6-b"],
        ["test:ad # df", "testad--df"],
        ["test:ad #23 df 2*/*", "testad-23-df-2"],
        ["test:ad 23 df 2*/*", "testad-23-df-2"],
        ["test:ad # 23 df 2*/*", "testad--23-df-2"],
        ["Anchors in Markdown", "anchors-in-markdown"],
        ["a_b_c", "a_b_c"],
        ["a-b-c", "a-b-c"],
        ["a-b-c----", "a-b-c----"],
        ["test:6a", "test6a"],
        ["test:a6", "testa6"],
        ["tes a a   a  a", "tes-a-a---a--a"],
        ["  tes a a   a  a  ", "tes-a-a---a--a"]]
    for parm,expect in cases:
        if cheanValue(parm) != expect:
            print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm)))
test()

Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github?
@lazyky @bioinformatist

@CaiCandong commented on GitHub (Aug 8, 2023): ``` def cheanValue(anchor_name): anchor_name = anchor_name.strip() ret = [] for c in anchor_name: if c.isalpha() or c.isdigit() or c == '_' or c == '-': ret.append(c.lower()) if c == ' ': ret.append('-') return ''.join(ret) def test(): cases = [ ["", ""], ["test(0)", "test0"], ["test!1", "test1"], ["test:2", "test2"], ["test*3", "test3"], ["test!4", "test4"], ["test:5", "test5"], ["test*6", "test6"], ["test:6 a", "test6-a"], ["test:6 !b", "test6-b"], ["test:ad # df", "testad--df"], ["test:ad #23 df 2*/*", "testad-23-df-2"], ["test:ad 23 df 2*/*", "testad-23-df-2"], ["test:ad # 23 df 2*/*", "testad--23-df-2"], ["Anchors in Markdown", "anchors-in-markdown"], ["a_b_c", "a_b_c"], ["a-b-c", "a-b-c"], ["a-b-c----", "a-b-c----"], ["test:6a", "test6a"], ["test:a6", "testa6"], ["tes a a a a", "tes-a-a---a--a"], [" tes a a a a ", "tes-a-a---a--a"]] for parm,expect in cases: if cheanValue(parm) != expect: print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm))) test() ``` Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github? @lazyky @bioinformatist
Author
Owner

@wxiaoguang commented on GitHub (Aug 8, 2023):

This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745

Quote the old comment from that issue:


I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

There are various characters would be removed&replaced during URL generation. For example, the single quote ' in your demo file, too.

https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click
https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click

Since there is no standard, so there is no right or wrong, as long as it works.

Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream goldmark use CommonMark standard.

@wxiaoguang commented on GitHub (Aug 8, 2023): This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745 Quote the old comment from that issue: ---- I would say it's more like a `feature` but not a `bug`, because Markdown is not a strict system, and there seems no unique standard. There are various characters would be removed&replaced during URL generation. For example, the single quote ``'`` in your demo file, too. ``` https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click ``` Since there is no standard, so there is no right or wrong, as long as it works. Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream `goldmark` use CommonMark standard.
Author
Owner

@CaiCandong commented on GitHub (Aug 8, 2023):

I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

I understand what you're saying, and it's not a bug. But do we need to adjust it so that github/vscode is consistent?

@CaiCandong commented on GitHub (Aug 8, 2023): > I would say it's more like a `feature` but not a `bug`, because Markdown is not a strict system, and there seems no unique standard. I understand what you're saying, and it's not a bug. But do we need to adjust it so that github/vscode is consistent?
Author
Owner

@wxiaoguang commented on GitHub (Aug 8, 2023):

Just to share the information from old issues. I am neutral for it.

@wxiaoguang commented on GitHub (Aug 8, 2023): Just to share the information from old issues. I am neutral for it.
Author
Owner

@bioinformatist commented on GitHub (Aug 8, 2023):

This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745

Quote the old comment from that issue:

I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

There are various characters would be removed&replaced during URL generation. For example, the single quote ' in your demo file, too.

https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click
https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click

Since there is no standard, so there is no right or wrong, as long as it works.

Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream goldmark use CommonMark standard.

Got that. Sure it is not a bug, but it seems that the logic of github is more straightforward and easier to use.

@bioinformatist commented on GitHub (Aug 8, 2023): > This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745 > > Quote the old comment from that issue: > > I would say it's more like a `feature` but not a `bug`, because Markdown is not a strict system, and there seems no unique standard. > > There are various characters would be removed&replaced during URL generation. For example, the single quote `'` in your demo file, too. > > ``` > https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click > https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click > ``` > > Since there is no standard, so there is no right or wrong, as long as it works. > > Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream `goldmark` use CommonMark standard. Got that. Sure it is not a bug, but it seems that the logic of github is more straightforward and easier to use.
Author
Owner

@lazyky commented on GitHub (Aug 8, 2023):

def cheanValue(anchor_name):
    anchor_name = anchor_name.strip()
    ret = []
    for c in anchor_name:
        if c.isalpha() or c.isdigit() or c == '_' or c == '-':
            ret.append(c.lower())
        if c == ' ':
            ret.append('-')
    return ''.join(ret)

def test():
    cases = [
        ["", ""],
        ["test(0)", "test0"],
        ["test!1", "test1"],
        ["test:2", "test2"],
        ["test*3", "test3"],
        ["test!4", "test4"],
        ["test:5", "test5"],
        ["test*6", "test6"],
        ["test:6 a", "test6-a"],
        ["test:6 !b", "test6-b"],
        ["test:ad # df", "testad--df"],
        ["test:ad #23 df 2*/*", "testad-23-df-2"],
        ["test:ad 23 df 2*/*", "testad-23-df-2"],
        ["test:ad # 23 df 2*/*", "testad--23-df-2"],
        ["Anchors in Markdown", "anchors-in-markdown"],
        ["a_b_c", "a_b_c"],
        ["a-b-c", "a-b-c"],
        ["a-b-c----", "a-b-c----"],
        ["test:6a", "test6a"],
        ["test:a6", "testa6"],
        ["tes a a   a  a", "tes-a-a---a--a"],
        ["  tes a a   a  a  ", "tes-a-a---a--a"]]
    for parm,expect in cases:
        if cheanValue(parm) != expect:
            print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm)))
test()

Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github? @lazyky @bioinformatist

Yes. That's right, but "" will not be rendered

github

d2cafcdf8f6f7a19fd0b848ad81d4ac
39c7a2bd9e2a1db1f853cda7972e15f

@lazyky commented on GitHub (Aug 8, 2023): > ``` > def cheanValue(anchor_name): > anchor_name = anchor_name.strip() > ret = [] > for c in anchor_name: > if c.isalpha() or c.isdigit() or c == '_' or c == '-': > ret.append(c.lower()) > if c == ' ': > ret.append('-') > return ''.join(ret) > > def test(): > cases = [ > ["", ""], > ["test(0)", "test0"], > ["test!1", "test1"], > ["test:2", "test2"], > ["test*3", "test3"], > ["test!4", "test4"], > ["test:5", "test5"], > ["test*6", "test6"], > ["test:6 a", "test6-a"], > ["test:6 !b", "test6-b"], > ["test:ad # df", "testad--df"], > ["test:ad #23 df 2*/*", "testad-23-df-2"], > ["test:ad 23 df 2*/*", "testad-23-df-2"], > ["test:ad # 23 df 2*/*", "testad--23-df-2"], > ["Anchors in Markdown", "anchors-in-markdown"], > ["a_b_c", "a_b_c"], > ["a-b-c", "a-b-c"], > ["a-b-c----", "a-b-c----"], > ["test:6a", "test6a"], > ["test:a6", "testa6"], > ["tes a a a a", "tes-a-a---a--a"], > [" tes a a a a ", "tes-a-a---a--a"]] > for parm,expect in cases: > if cheanValue(parm) != expect: > print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm))) > test() > ``` > > Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github? @lazyky @bioinformatist Yes. That's right, but `""` will not be rendered #### github ![d2cafcdf8f6f7a19fd0b848ad81d4ac](https://github.com/go-gitea/gitea/assets/24838473/1d4d1426-8eb6-4d82-afd4-d54d55eeb0f0) ![39c7a2bd9e2a1db1f853cda7972e15f](https://github.com/go-gitea/gitea/assets/24838473/fd4bc527-b242-4c85-a71b-10ffe5638207)
Author
Owner

@CaiCandong commented on GitHub (Aug 8, 2023):

Yes. That's right, but "" will not be rendered

These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?

@CaiCandong commented on GitHub (Aug 8, 2023): > Yes. That's right, but `""` will not be rendered These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?
Author
Owner

@lazyky commented on GitHub (Aug 8, 2023):

Yes. That's right, but "" will not be rendered

These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?

Ok. Below is the examples I tested on github

[
    ["tes()", "tes"],
    ["tes…@a", "tesa"],
    ["tes¥& a", "tes-a"],
    ["tes= a", "tes-a"],
    ["tes|a", "tesa"],
    ["tes\a", "tesa"],
    ["tes/a", "tesa"]
]
@lazyky commented on GitHub (Aug 8, 2023): > > > Yes. That's right, but `""` will not be rendered > > These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases? Ok. Below is the examples I tested on github ```python [ ["tes()", "tes"], ["tes…@a", "tesa"], ["tes¥& a", "tes-a"], ["tes= a", "tes-a"], ["tes|a", "tesa"], ["tes\a", "tesa"], ["tes/a", "tesa"] ] ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#11415