为什么在 Python 字符串中3个反斜杠等于4?

你能告诉我为什么 '?\\\?'=='?\\\\?'True吗? 这让我发疯,我找不到一个合理的答案..。

>>> list('?\\\?')
['?', '\\', '\\', '?']
>>> list('?\\\\?')
['?', '\\', '\\', '?']
8111 次浏览

Basically, because python is slightly lenient in backslash processing. Quoting from https://docs.python.org/2.0/ref/strings.html :

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.

(Emphasis in the original)

Therefore, in python, it isn't that three backslashes are equal to four, it's that when you follow backslash with a character like ?, the two together come through as two characters, because \? is not a recognized escape sequence.

Because \x in a character string, when x is not one of the special backslashable characters like n, r, t, 0, etc, evaluates to a string with a backslash and then an x.

>>> '\?'
'\\?'

From the python lexical analysis page under string literals at: https://docs.python.org/2/reference/lexical_analysis.html

There is a table that lists all the recognized escape sequences.

\\ is an escape sequence that is === \

\? is not an escape sequence and is === \?

so '\\\\' is '\\' followed by '\\' which is '\\' (two escaped \)

and '\\\' is '\\' followed by '\' which is also '\\' (one escaped \ and one raw \)

also, it should be noted that python does not distinguish between single and double quotes surrounding a string literal, unlike some other languages.

So 'String' and "String" are the exact same thing in python, they do not affect the interpretation of escape sequences.

This is because backslash acts as an escape character for the character(s) immediately following it, if the combination represents a valid escape sequence. The dozen or so escape sequences are listed here. They include the obvious ones such as newline \n, horizontal tab \t, carriage return \r and more obscure ones such as named unicode characters using \N{...}, e.g. \N{WAVY DASH} which represents unicode character \u3030. The key point though is that if the escape sequence is not known, the character sequence is left in the string as is.

Part of the problem might also be that the Python interpreter output is misleading you. This is because the backslashes are escaped when displayed. However, if you print those strings, you will see the extra backslashes disappear.

>>> '?\\\?'
'?\\\\?'
>>> print('?\\\?')
?\\?
>>> '?\\\?' == '?\\?'    # I don't know why you think this is True???
False
>>> '?\\\?' == r'?\\?'   # but if you use a raw string for '?\\?'
True
>>> '?\\\\?' == '?\\\?'  # this is the same string... see below
True

For your specific examples, in the first case '?\\\?', the first \ escapes the second backslash leaving a single backslash, but the third backslash remains as a backslash because \? is not a valid escape sequence. Hence the resulting string is ?\\?.

For the second case '?\\\\?', the first backslash escapes the second, and the third backslash escapes the fourth which results in the string ?\\?.

So that's why three backslashes is the same as four:

>>> '?\\\?' == '?\\\\?'
True

If you want to create a string with 3 backslashes you can escape each backslash:

>>> '?\\\\\\?'
'?\\\\\\?'
>>> print('?\\\\\\?')
?\\\?

or you might find "raw" strings more understandable:

>>> r'?\\\?'
'?\\\\\\?'
>>> print(r'?\\\?')
?\\\?

This turns of escape sequence processing for the string literal. See String Literals for more details.

mhawke's answer pretty much covers it, I just want to restate it in a more concise form and with minimal examples that illustrate this behaviour.

I guess one thing to add is that escape processing moves from left to right, so that \n first finds the backslash and then looks for a character to escape, then finds n and escapes it; \\n finds first backslash, finds second and escapes it, then finds n and sees it as a literal n; \? finds backslash and looks for a char to escape, finds ? which cannot be escaped, and so treats \ as a literal backslash.

As mhawke noted, the key here is that interactive interpreter escapes the backslash when displaying a string. I'm guessing the reason for that is to ensure that text strings copied from interpreter into code editor are valid python strings. However, in this case this allowance for convenience causes confusion.

>>> print('\?') # \? is not a valid escape code so backslash is left as-is
\?
>>> print('\\?') # \\ is a valid escape code, resulting in a single backslash
'\?'


>>> '\?' # same as first example except that interactive interpreter escapes the backslash
\\?
>>> '\\?' # same as second example, backslash is again escaped
\\?