为什么提前返回比其他更慢?

这是几天前我给出的答案的后续问题。编辑:,似乎该问题的OP已经使用了我发布给他的代码来询问同样的问题,但我并不知情。抱歉。不过,提供的答案是不同的!

实际上,我观察到:

>>> def without_else(param=False):
...     if param:
...         return 1
...     return 0
>>> def with_else(param=False):
...     if param:
...         return 1
...     else:
...         return 0
>>> from timeit import Timer as T
>>> T(lambda : without_else()).repeat()
[0.3011460304260254, 0.2866089344024658, 0.2871549129486084]
>>> T(lambda : with_else()).repeat()
[0.27536892890930176, 0.2693932056427002, 0.27011704444885254]
>>> T(lambda : without_else(True)).repeat()
[0.3383951187133789, 0.32756996154785156, 0.3279120922088623]
>>> T(lambda : with_else(True)).repeat()
[0.3305950164794922, 0.32186388969421387, 0.3209099769592285]

...换句话说:无论是否触发if条件,使用else子句都会更快。

我认为这与两者生成的不同字节码有关,但有人能详细确认/解释吗?

编辑:似乎不是每个人都能够复制我的时间,所以我想它可能是有用的,给我的系统一些信息。我运行的是64位Ubuntu 11.10,默认安装了Python.python生成以下版本信息:

Python 2.7.2+ (default, Oct  4 2011, 20:06:09)
[GCC 4.6.1] on linux2

下面是Python 2.7中反汇编的结果:

>>> dis.dis(without_else)
2           0 LOAD_FAST                0 (param)
3 POP_JUMP_IF_FALSE       10


3           6 LOAD_CONST               1 (1)
9 RETURN_VALUE


4     >>   10 LOAD_CONST               2 (0)
13 RETURN_VALUE
>>> dis.dis(with_else)
2           0 LOAD_FAST                0 (param)
3 POP_JUMP_IF_FALSE       10


3           6 LOAD_CONST               1 (1)
9 RETURN_VALUE


5     >>   10 LOAD_CONST               2 (0)
13 RETURN_VALUE
14 LOAD_CONST               0 (None)
17 RETURN_VALUE
14927 次浏览

This is a pure guess, and I haven't figured out an easy way to check whether it is right, but I have a theory for you.

I tried your code and get the same of results, without_else() is repeatedly slightly slower than with_else():

>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]

Considering that the bytecode is identical, the only difference is the name of the function. In particular the timing test does a lookup on the global name. Try renaming without_else() and the difference disappears:

>>> def no_else(param=False):
if param:
return 1
return 0


>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]

My guess is that without_else has a hash collision with something else in globals() so the global name lookup is slightly slower.

Edit: A dictionary with 7 or 8 keys probably has 32 slots, so on that basis without_else has a hash collision with __builtins__:

>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]

To clarify how the hashing works:

__builtins__ hashes to -1196389688 which reduced modulo the table size (32) means it is stored in the #8 slot of the table.

without_else hashes to 505688136 which reduced modulo 32 is 8 so there's a collision. To resolve this Python calculates:

Starting with:

j = hash % 32
perturb = hash

Repeat this until we find a free slot:

j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;

which gives it 17 to use as the next index. Fortunately that's free so the loop only repeats once. The hash table size is a power of 2, so 2**i is the size of the hash table, i is the number of bits used from the hash value j.

Each probe into the table can find one of these:

  • The slot is empty, in that case the probing stops and we know the value is not in the table.

  • The slot is unused but was used in the past in which case we go try the next value calculated as above.

  • The slot is full but the full hash value stored in the table isn't the same as the hash of the key we are looking for (that's what happens in the case of __builtins__ vs without_else).

  • The slot is full and has exactly the hash value we want, then Python checks to see if the key and the object we are looking up are the same object (which in this case they will be because short strings that could be identifiers are interned so identical identifiers use the exact same string).

  • Finally when the slot is full, the hash matches exactly, but the keys are not the identical object, then and only then will Python try comparing them for equality. This is comparatively slow, but in the case of name lookups shouldn't actually happen.