Python string interning

While this question doesn't have any real use in practice, I am curious as to how Python does string interning. I have noticed the following.

>>> "string" is "string"
True

This is as I expected.

You can also do this.

>>> "strin"+"g" is "string"
True

And that's pretty clever!

But you can't do this.

>>> s1 = "strin"
>>> s2 = "string"
>>> s1+"g" is s2
False

Why wouldn't Python evaluate s1+"g", and realize it is the same as s2 and point it to the same address? What is actually going on in that last block to have it return False?

21963 次浏览

这是特定于实现的,但是您的解释器可能实际使用的是编译时常量,而不是运行时表达式的结果。

In what follows CPython 3.9.0+ is used.

In the second example, the expression "strin"+"g" is evaluated at compile time, and is replaced with "string". This makes the first two examples behave the same.

如果我们检查字节码,我们会发现它们是完全一样的:

  # s1 = "string"
1           0 LOAD_CONST               0 ('string')
2 STORE_NAME               0 (s1)


# s2 = "strin" + "g"
2           4 LOAD_CONST               0 ('string')
6 STORE_NAME               1 (s2)

这个字节码是通过以下方法获得的(在上面的代码之后打印几行) :

import dis


source = 's1 = "string"\ns2 = "strin" + "g"'
code = compile(source, '', 'exec')
print(dis.dis(code))

The third example involves a run-time concatenation, the result of which is not automatically interned:

  # s3a = "strin"
3           8 LOAD_CONST               1 ('strin')
10 STORE_NAME               2 (s3a)


# s3 = s3a + "g"
4          12 LOAD_NAME                2 (s3a)
14 LOAD_CONST               2 ('g')
16 BINARY_ADD
18 STORE_NAME               3 (s3)
20 LOAD_CONST               3 (None)
22 RETURN_VALUE

这个字节码是通过以下方法获得的(它在上面的代码之前再打印几行,这些行与上面给出的第一个字节码块中的行完全一样) :

import dis


source = (
's1 = "string"\n'
's2 = "strin" + "g"\n'
's3a = "strin"\n'
's3 = s3a + "g"')
code = compile(source, '', 'exec')
print(dis.dis(code))

如果你手动 sys.intern()第三个表达式的结果,你会得到和之前一样的对象:

>>> import sys
>>> s3a = "strin"
>>> s3 = s3a + "g"
>>> s3 is "string"
False
>>> sys.intern(s3) is "string"
True

此外,Python 3.9为上面的最后两个语句打印了一个警告:

语法警告: “ is”带有字面意思。你的意思是“ = =”吗?

案例1

>>> x = "123"
>>> y = "123"
>>> x == y
True
>>> x is y
True
>>> id(x)
50986112
>>> id(y)
50986112

案例2

>>> x = "12"
>>> y = "123"
>>> x = x + "3"
>>> x is y
False
>>> x == y
True

Now, your question is why the id is same in case 1 and not in case 2.
在情况1中,您已经为 xy分配了一个字符串文字 "123"

由于 string 是不可变的,因此解释器只存储字符串文本一次并将所有变量指向同一个对象是有意义的。
因此您可以看到 id 是相同的。

在情况2中,您使用连接修改 xxy具有相同的值,但不具有相同的标识。
两者都指向内存中的不同对象,因此它们返回的 idis操作符不同