Guidance needed on how to deal with build metadata in Uris #122

Closed
opened 2026-02-17 11:18:48 -06:00 by GiteaMirror · 9 comments
Owner

Originally created by @jeffhandley on GitHub (Feb 14, 2014).

Uri encoding of + characters is horrible. Is there guidance on how to handle it?

Consider the following:

  1. Package Foo, Version 1.2.3-alpha+20140214001 is created
  2. It is published to a server
  3. The server needs to be able to serve that package, specifically that build
  4. The server uses a route of /packages/{packageid}/{packageversion}

In this case, we'll get /packages/foo/1.2.3-alpha+20140214001

The server won't be able to tell a difference between "1.2.3-alpha+20140214001" and "1.2.3-alpha 20140214001" and in many cases, the framework will provide the Uri with the space in it to the application code.

Do we just deal with this, replacing spaces with + characters after getting the package version from the Uri? Or is there a better approach for encoding the + characters?

Originally created by @jeffhandley on GitHub (Feb 14, 2014). Uri encoding of `+` characters is horrible. Is there guidance on how to handle it? Consider the following: 1. Package Foo, Version 1.2.3-alpha+20140214001 is created 2. It is published to a server 3. The server needs to be able to serve that package, specifically that build 4. The server uses a route of /packages/{packageid}/{packageversion} In this case, we'll get /packages/foo/1.2.3-alpha+20140214001 The server won't be able to tell a difference between "1.2.3-alpha+20140214001" and "1.2.3-alpha 20140214001" and in many cases, the framework will provide the Uri with the space in it to the application code. Do we just deal with this, replacing spaces with `+` characters after getting the package version from the Uri? Or is there a better approach for encoding the `+` characters?
Author
Owner

@maartenba commented on GitHub (Feb 14, 2014):

Is + to be encoded in a URL's path part or only in the query string part?

@maartenba commented on GitHub (Feb 14, 2014): Is + to be encoded in a URL's path part or only in the query string part?
Author
Owner

@jeffhandley commented on GitHub (Feb 14, 2014):

We should cover both, but I was specifically interested in the Path.

@jeffhandley commented on GitHub (Feb 14, 2014): We should cover both, but I was specifically interested in the Path.
Author
Owner

@jeffhandley commented on GitHub (Feb 14, 2014):

I created a really simple ASP.NET MVC 4 application, with a single route of the following:

routes.MapRoute(
    name: "Packages",
    url: "packages/{id}/{version}",
    defaults: new { controller = "Packages", action = "Index" }
);

Then I ran the application and navigated to this url:
/packages/foo/1.2.3-alpha+20120214001

This results in the following error from ASP.NET. So allowing the + in the path will require working around this security feature. I wish we had recognized this before finalizing on the + character instead of ~.

HTTP Error 404.11 - Not Found

The request filtering module is configured to deny a request that contains a double escape sequence.

Most likely causes:

The request contained a double escape sequence and request filtering is configured on the Web server to deny double escape sequences.

Things you can try:

Verify the configuration/system.webServer/security/requestFiltering@allowDoubleEscaping setting in the applicationhost.config or web.confg file.

More Information:

This is a security feature. Do not change this feature unless the scope of the change is fully understood. You should take a network trace before changing this value to confirm that the request is not malicious. If double escape sequences are allowed by the server, modify the configuration/system.webServer/security/requestFiltering@allowDoubleEscaping setting. This could be caused by a malformed URL sent to the server by a malicious user.

@jeffhandley commented on GitHub (Feb 14, 2014): I created a really simple ASP.NET MVC 4 application, with a single route of the following: ``` csharp routes.MapRoute( name: "Packages", url: "packages/{id}/{version}", defaults: new { controller = "Packages", action = "Index" } ); ``` Then I ran the application and navigated to this url: `/packages/foo/1.2.3-alpha+20120214001` This results in the following error from ASP.NET. So allowing the `+` in the path will require working around this security feature. I wish we had recognized this before finalizing on the `+` character instead of `~`. ## HTTP Error 404.11 - Not Found The request filtering module is configured to deny a request that contains a double escape sequence. ### Most likely causes: The request contained a double escape sequence and request filtering is configured on the Web server to deny double escape sequences. ### Things you can try: Verify the configuration/system.webServer/security/requestFiltering@allowDoubleEscaping setting in the applicationhost.config or web.confg file. ### More Information: This is a security feature. Do not change this feature unless the scope of the change is fully understood. You should take a network trace before changing this value to confirm that the request is not malicious. If double escape sequences are allowed by the server, modify the configuration/system.webServer/security/requestFiltering@allowDoubleEscaping setting. This could be caused by a malformed URL sent to the server by a malicious user.
Author
Owner

@maartenba commented on GitHub (Feb 14, 2014):

%20 might work, but I would go with

<system.webServer>
    <security>
        <requestFiltering allowDoubleEscaping="true"/>
    </security>
</system.webServer>

or create a request filter that fixes this for some paths but not all.

Also not sure if a catchall might work? E.g. {*packageVersion}

@maartenba commented on GitHub (Feb 14, 2014): %20 might work, but I would go with ``` <system.webServer> <security> <requestFiltering allowDoubleEscaping="true"/> </security> </system.webServer> ``` or create a request filter that fixes this for some paths but not all. Also not sure if a catchall might work? E.g. {*packageVersion}
Author
Owner

@tbull commented on GitHub (Feb 14, 2014):

Special characters need special escaping. This applies to URLs just like
any other syntax realm. The so-called Percent-encoding is used in URLs.
http://en.wikipedia.org/wiki/Percent-encoding
The + character is escaped to %2B in this scheme.

This is not some weird wizardry, it is indeed the utterly most
fundamental basic knowledge of any web developer, virtually the very
first thing any web developer learns. Apparently, you need quite some
education.

[CLOSE!]

@tbull commented on GitHub (Feb 14, 2014): Special characters need special escaping. This applies to URLs just like any other syntax realm. The so-called Percent-encoding is used in URLs. http://en.wikipedia.org/wiki/Percent-encoding The + character is escaped to %2B in this scheme. This is not some weird wizardry, it is indeed the utterly most fundamental basic knowledge of any web developer, virtually the very first thing any web developer learns. Apparently, you need quite some education. [CLOSE!]
Author
Owner

@jeffhandley commented on GitHub (Feb 14, 2014):

Unfortunately, even %2b leads to ASP.NET returning the "The request filtering module is configured to deny a request that contains a double escape sequence." error.

I'm okay with saying that since the + character doesn't technically need to be encoded as part of the path (see this reference), that clients should just send it as a + character, but that %2b should also be respected.

But perhaps we should address this in the FAQ that some web servers get in the way when using + but that you should absolutely not encode the character to anything other than %2b or use the raw +.

@jeffhandley commented on GitHub (Feb 14, 2014): Unfortunately, even `%2b` leads to ASP.NET returning the "The request filtering module is configured to deny a request that contains a double escape sequence." error. I'm okay with saying that since the `+` character doesn't technically need to be encoded as part of the path (see [this reference](http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding)), that clients should just send it as a `+` character, but that `%2b` should also be respected. But perhaps we should address this in the FAQ that some web servers get in the way when using `+` but that you should absolutely _not_ encode the character to anything other than `%2b` or use the raw `+`.
Author
Owner

@crazedsanity commented on GitHub (Feb 14, 2014):

The web server could simply have a rule that, for a particular path (eg. "/packages") the "+" character can simply be rewritten into "~" or removed. In that case, the URI would simply change from

/packages/foo/1.2.3-alpha+20140214001

to

/packages/foo/1.2.3-alpha~20140214001

or

/packages/foo/1.2.3-alpha20140214001

Implementing this change is based on the server, but I don't think it's all that hard. But why would this even be included in the documentation? Seems off-topic,

@crazedsanity commented on GitHub (Feb 14, 2014): The web server could simply have a rule that, for a particular path (eg. "/packages") the "+" character can simply be rewritten into "~" or removed. In that case, the URI would simply change from ``` /packages/foo/1.2.3-alpha+20140214001 ``` to ``` /packages/foo/1.2.3-alpha~20140214001 ``` or ``` /packages/foo/1.2.3-alpha20140214001 ``` Implementing this change is based on the server, but I don't think it's all that hard. But why would this even be included in the documentation? Seems off-topic,
Author
Owner

@tbull commented on GitHub (Feb 14, 2014):

Jeff Handley wrote:

Unfortunately, even %2b leads to ASP.NET returning the "The request
filtering module is configured to deny a request that contains a
double escape sequence." error.

That is clearly a problem of the software you use (ASP) or its
configuration in your specific instance. It has nothing to do with
Semver, not even remotely. If you want your application to accept
certain data, configure it to accept that kind of data. Or, make up some
kind of transfer encoding where the + is transliterated to something
else, like ! for example (assuming that character is accepted by the
server config), and then translated back on the receiving end.

I'm okay with saying that since the + character doesn't technically
need to be encoded as part of the path (see this
reference
),
that clients should just send it as a + character, but that %2b
should also be respected.

The + character in an URL already /is/ the encoding for something,
namely for the space character. That's why it is decoded to a space. The
encoded form of an original + character, however, is %2B.
Example:
https://www.google.de/?q=some+stuff
https://www.google.de/?q=some%2Bstuff
Every web application works this way.

But perhaps we should address this in the FAQ that some web servers
get in the way when using + but that you should absolutely not
encode the character to anything other than %2b or use the raw +.

No. There may be some surprising consequences of the Semver spec in
certain situations that people could or should be made aware of in an
FAQ-like document. But this is not one of them. This is just most basic
stuff everyone should know. Like when it rains outside, people tend to
get wet. Should we have an FAQ doc answering complaints of people who
got wet after they came in contact with Semver in some way? Like "We
keep getting reports of people who got wet after they used Semver.
However, this tends to happen only if it rains. It appears that there is
a causal connection, but we are not sure yet. Workaround for the time
being: use an umbrella. We have plausible reports that umbrellas help in
avoiding getting wet. The issue is under close investigation by the
Semver team." But in fact, Semver has nothing to do with the rain or
with anybody getting wet or with some webserver rejecting arbitrary
characters in URLs.

However, we do indeed keep getting questions like this, where people
complain that one or the other character has some reserved meaning in
their specific scheme of whatever. #97, for example. This is beginning
to become annoying. Is there something we could do about it? This looks
like an obvious thing to me. If you change the realm, different rules
apply. I wonder how these guys keep track of something if more than two
encoding schemes are encountered at once, like PHP pages which generate
HTML which contains CSS, JS, URL-encoding and maybe some DSL which is
sent back to the server.

@tbull commented on GitHub (Feb 14, 2014): Jeff Handley wrote: > Unfortunately, even `%2b` leads to ASP.NET returning the "The request > filtering module is configured to deny a request that contains a > double escape sequence." error. That is clearly a problem of the software you use (ASP) or its configuration in your specific instance. It has nothing to do with Semver, not even remotely. If you want your application to accept certain data, configure it to accept that kind of data. Or, make up some kind of transfer encoding where the `+` is transliterated to something else, like `!` for example (assuming that character is accepted by the server config), and then translated back on the receiving end. > I'm okay with saying that since the `+` character doesn't technically > need to be encoded as part of the path (see [this > reference](http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding)), > that clients should just send it as a `+` character, but that `%2b` > should also be respected. The `+` character in an URL already /is/ the encoding for something, namely for the space character. That's why it is decoded to a space. The encoded form of an original `+` character, however, is `%2B`. Example: https://www.google.de/?q=some+stuff https://www.google.de/?q=some%2Bstuff Every web application works this way. > But perhaps we should address this in the FAQ that some web servers > get in the way when using `+` but that you should absolutely _not_ > encode the character to anything other than `%2b` or use the raw `+`. No. There may be some surprising consequences of the Semver spec in certain situations that people could or should be made aware of in an FAQ-like document. But this is not one of them. This is just most basic stuff everyone should know. Like when it rains outside, people tend to get wet. Should we have an FAQ doc answering complaints of people who got wet after they came in contact with Semver in some way? Like "We keep getting reports of people who got wet after they used Semver. However, this tends to happen only if it rains. It appears that there is a causal connection, but we are not sure yet. Workaround for the time being: use an umbrella. We have plausible reports that umbrellas help in avoiding getting wet. The issue is under close investigation by the Semver team." But in fact, Semver has nothing to do with the rain or with anybody getting wet or with some webserver rejecting arbitrary characters in URLs. However, we do indeed keep getting questions like this, where people complain that one or the other character has some reserved meaning in their specific scheme of whatever. #97, for example. This is beginning to become annoying. Is there something we could do about it? This looks like an obvious thing to me. If you change the realm, different rules apply. I wonder how these guys keep track of something if more than two encoding schemes are encountered at once, like PHP pages which generate HTML which contains CSS, JS, URL-encoding and maybe some DSL which is sent back to the server.
Author
Owner

@jeffhandley commented on GitHub (Feb 14, 2014):

Haha, fair enough. Closing as an external issue. I was hoping the guidance would be that we should not change to other characters like ! or anything, because you never know if those will be introduced in future SemVer formats for new fields. Oh well.

@jeffhandley commented on GitHub (Feb 14, 2014): Haha, fair enough. Closing as an external issue. I was hoping the guidance would be that we _should not_ change to other characters like `!` or anything, because you never know if those will be introduced in future SemVer formats for new fields. Oh well.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/semver#122