GET 参数中允许的字符

GET 参数中允许哪些字符不进行编码或转义:

Http://www.example.org/page.php?name=xyz

除了 XYZ,你还能有什么? 我想只有以下几个字符:

  • A-Z (A-Z)
  • 0-9
  • -
  • 我不知道

这是完整的列表还是允许有其他字符?

我希望你能帮助我。预先感谢!

136561 次浏览

"." | "!" | "~" | "*" | "'" | "(" | ")" are also acceptable [RFC2396]. Really, anything can be in a GET parameter if it is properly encoded.

Alphanumeric characters and all of

~ - _ . ! * ' ( ) ,

are valid within an URL.

All other characters must be encoded.

There are reserved characters, that have a reserved meanings, those are delimiters — :/?#[]@ — and subdelimiters — !$&'()*+,;=

There is also a set of characters called unreserved characters — alphanumerics and -._~ — which are not to be encoded.

That means, that anything that doesn't belong to unreserved characters set is supposed to be %-encoded, when they do not have special meaning (e.g. when passed as a part of GET parameter).

See also RFC3986: Uniform Resource Identifier (URI): Generic Syntax

From RFC 1738 on which characters are allowed in URLs:

Only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

The reserved characters are ";", "/", "?", ":", "@", "=" and "&", which means you would need to URL encode them if you wish to use them.

I did a test using the Chrome address bar and a $QUERY_STRING in bash, and observed the following:

~!@$%^&*()-_=+[{]}\|;:',./? and grave (backtick) are passed through as plaintext.

, ", < and > are converted to %20, %22, %3C and %3E respectively.

# is ignored, since it is used by ye olde anchor.

Personally, I'd say bite the bullet and encode with base64 :)

All of the rules concerning the encoding of URIs (which contains URNs and URLs) are specified in the RFC1738 and the RFC3986, here's a TL;DR of these long and boring documents:

Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a URI under certain circumstances. The characters allowed in a URI are either reserved or unreserved. Reserved characters are those characters that sometimes have special meaning, but they are not the only characters that needs encoding.

There are 66 unreserved characters that doesn't need any encoding: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~

There are 18 reserved characters which needs to be encoded: !*'();:@&=+$,/?#[], and all the other characters must be encoded.

To percent-encode a character, simply concatenate "%" and its ASCII value in hexadecimal. The php functions "urlencode" and "rawurlencode" do this job for you.

The question asks which characters are allowed in GET parameters without encoding or escaping them.

According to RFC3986 (general URL syntax) and RFC7230, section 2.7.1 (HTTP/S URL syntax) the only characters you need to percent-encode are those outside of the query set, see the definition below.

However, there are additional specifications like HTML5, Web forms, and the obsolete Indexed search, W3C recommendation. Those documents add a special meaning to some characters notably, to symbols like = & + ;.

Other answers here suggest that most of the reserved characters should be encoded, including "/" "?". That's not correct. In fact, RFC3986, section 3.4 advises against percent-encoding "/" "?" characters.

it is sometimes better for usability to avoid percent- encoding those characters.

RFC3986 defines query component as:

query       = *( pchar / "/" / "?" )
pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

The conclusion is that XYZ part should encode:

special: # % = & ;
Space
sub-delims
out of query set: [ ]
non ASCII encodable characters

Unless special symbols = & ; are key=value separators.

Encoding other characters is allowed but not necessary.