Valid characters for directory part of a URL (for short links)

Are there any other characters except A-Za-z0-9 that can be used to shorten links without getting into trouble? :)

I was thinking about +,;- or something.

Is there a defined standard regarding what characters can be used in a URL that browser vendors respect?

43879 次浏览

According to RFC 3986 the valid characters for the path component are:

a-z A-Z 0-9 . - _ ~ ! $ & ' ( ) * + , ; = : @

as well as percent-encoded characters and of course, the slash /.

Keep in mind, though, that many applications (not necessarily browsers) that attempt to parse URIs to make them clickable, for example, may support a much smaller set of characters. This is akin to parsing e-mail addresses where most attempts also don't catch all addresses allowed by the standard.

A path segment (the parts in a path separated by /) in an absolute URI path can contain zero or more of pchar that is defined as follows:

  pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="

So it’s basically AZ, az, 09, -, ., _, ~, Z0, Z1, Z2, Z3, Z4, Z5, Z6, Z7, Z8, Z9, a0, a1, a2, as well as a3 that must be followed by two hexadecimal digits. Any other character/byte needs to be encoded using the a4.

Although these are 79 characters in total that can be used in a path segment literally, some user agents do encode some of these characters as well (e.g. %7E instead of ~). That’s why many use just the 62 alphanumeric characters (i.e. AZ, az, 09) or the ~6 (i.e. AZ, az, 09, ~4, ~5).