mirror of
https://github.com/semver/semver.git
synced 2026-03-22 14:10:15 -05:00
Guidance needed on how to deal with build metadata in Uris #122
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jeffhandley on GitHub (Feb 14, 2014).
Uri encoding of
+characters is horrible. Is there guidance on how to handle it?Consider the following:
In this case, we'll get /packages/foo/1.2.3-alpha+20140214001
The server won't be able to tell a difference between "1.2.3-alpha+20140214001" and "1.2.3-alpha 20140214001" and in many cases, the framework will provide the Uri with the space in it to the application code.
Do we just deal with this, replacing spaces with
+characters after getting the package version from the Uri? Or is there a better approach for encoding the+characters?@maartenba commented on GitHub (Feb 14, 2014):
Is + to be encoded in a URL's path part or only in the query string part?
@jeffhandley commented on GitHub (Feb 14, 2014):
We should cover both, but I was specifically interested in the Path.
@jeffhandley commented on GitHub (Feb 14, 2014):
I created a really simple ASP.NET MVC 4 application, with a single route of the following:
Then I ran the application and navigated to this url:
/packages/foo/1.2.3-alpha+20120214001This results in the following error from ASP.NET. So allowing the
+in the path will require working around this security feature. I wish we had recognized this before finalizing on the+character instead of~.HTTP Error 404.11 - Not Found
The request filtering module is configured to deny a request that contains a double escape sequence.
Most likely causes:
The request contained a double escape sequence and request filtering is configured on the Web server to deny double escape sequences.
Things you can try:
Verify the configuration/system.webServer/security/requestFiltering@allowDoubleEscaping setting in the applicationhost.config or web.confg file.
More Information:
This is a security feature. Do not change this feature unless the scope of the change is fully understood. You should take a network trace before changing this value to confirm that the request is not malicious. If double escape sequences are allowed by the server, modify the configuration/system.webServer/security/requestFiltering@allowDoubleEscaping setting. This could be caused by a malformed URL sent to the server by a malicious user.
@maartenba commented on GitHub (Feb 14, 2014):
%20 might work, but I would go with
or create a request filter that fixes this for some paths but not all.
Also not sure if a catchall might work? E.g. {*packageVersion}
@tbull commented on GitHub (Feb 14, 2014):
Special characters need special escaping. This applies to URLs just like
any other syntax realm. The so-called Percent-encoding is used in URLs.
http://en.wikipedia.org/wiki/Percent-encoding
The + character is escaped to %2B in this scheme.
This is not some weird wizardry, it is indeed the utterly most
fundamental basic knowledge of any web developer, virtually the very
first thing any web developer learns. Apparently, you need quite some
education.
[CLOSE!]
@jeffhandley commented on GitHub (Feb 14, 2014):
Unfortunately, even
%2bleads to ASP.NET returning the "The request filtering module is configured to deny a request that contains a double escape sequence." error.I'm okay with saying that since the
+character doesn't technically need to be encoded as part of the path (see this reference), that clients should just send it as a+character, but that%2bshould also be respected.But perhaps we should address this in the FAQ that some web servers get in the way when using
+but that you should absolutely not encode the character to anything other than%2bor use the raw+.@crazedsanity commented on GitHub (Feb 14, 2014):
The web server could simply have a rule that, for a particular path (eg. "/packages") the "+" character can simply be rewritten into "~" or removed. In that case, the URI would simply change from
to
or
Implementing this change is based on the server, but I don't think it's all that hard. But why would this even be included in the documentation? Seems off-topic,
@tbull commented on GitHub (Feb 14, 2014):
Jeff Handley wrote:
That is clearly a problem of the software you use (ASP) or its
configuration in your specific instance. It has nothing to do with
Semver, not even remotely. If you want your application to accept
certain data, configure it to accept that kind of data. Or, make up some
kind of transfer encoding where the
+is transliterated to somethingelse, like
!for example (assuming that character is accepted by theserver config), and then translated back on the receiving end.
The
+character in an URL already /is/ the encoding for something,namely for the space character. That's why it is decoded to a space. The
encoded form of an original
+character, however, is%2B.Example:
https://www.google.de/?q=some+stuff
https://www.google.de/?q=some%2Bstuff
Every web application works this way.
No. There may be some surprising consequences of the Semver spec in
certain situations that people could or should be made aware of in an
FAQ-like document. But this is not one of them. This is just most basic
stuff everyone should know. Like when it rains outside, people tend to
get wet. Should we have an FAQ doc answering complaints of people who
got wet after they came in contact with Semver in some way? Like "We
keep getting reports of people who got wet after they used Semver.
However, this tends to happen only if it rains. It appears that there is
a causal connection, but we are not sure yet. Workaround for the time
being: use an umbrella. We have plausible reports that umbrellas help in
avoiding getting wet. The issue is under close investigation by the
Semver team." But in fact, Semver has nothing to do with the rain or
with anybody getting wet or with some webserver rejecting arbitrary
characters in URLs.
However, we do indeed keep getting questions like this, where people
complain that one or the other character has some reserved meaning in
their specific scheme of whatever. #97, for example. This is beginning
to become annoying. Is there something we could do about it? This looks
like an obvious thing to me. If you change the realm, different rules
apply. I wonder how these guys keep track of something if more than two
encoding schemes are encountered at once, like PHP pages which generate
HTML which contains CSS, JS, URL-encoding and maybe some DSL which is
sent back to the server.
@jeffhandley commented on GitHub (Feb 14, 2014):
Haha, fair enough. Closing as an external issue. I was hoping the guidance would be that we should not change to other characters like
!or anything, because you never know if those will be introduced in future SemVer formats for new fields. Oh well.