为什么 Python 递归如此昂贵,我们能做些什么呢?

假设我们要计算一些斐波那契数,模997。

对于 C + + 中的 n=500,我们可以运行

#include <iostream>
#include <array>


std::array<int, 2> fib(unsigned n) {
if (!n)
return {1, 1};
auto x = fib(n - 1);
return {(x[0] + x[1]) % 997, (x[0] + 2 * x[1]) % 997};
}


int main() {
std::cout << fib(500)[0];
}

还有巨蟒

def fib(n):
if n==1:
return (1, 2)
x=fib(n-1)
return ((x[0]+x[1]) % 997, (x[0]+2*x[1]) % 997)


if __name__=='__main__':
print(fib(500)[0])

两者都将毫无问题地找到答案996。我们采用模来保持输出大小的合理性,并使用对来避免指数分支。

对于 n=5000,C + + 代码输出783,但 Python 会抱怨

RecursionError: maximum recursion depth exceeded in comparison

如果我们加几行

import sys


def fib(n):
if n==1:
return (1, 2)
x=fib(n-1)
return ((x[0]+x[1]) % 997, (x[0]+2*x[1]) % 997)


if __name__=='__main__':
sys.setrecursionlimit(5000)
print(fib(5000)[0])

那么 Python 也会给出正确的答案。

因为当 Python 崩溃时(至少在我的机器上) ,n=50000 C + + 在毫秒内找到答案151。

为什么在 C + + 中递归调用要便宜得多?我们能够以某种方式修改 Python 编译器,使其更容易接受递归吗?

当然,一种解决方案是用迭代代替递归。对于斐波那契数,这是很容易做到的。然而,这将交换初始条件和终端条件,而后者对于许多问题来说是棘手的(例如 alpha-beta 修剪)。所以一般来说,这需要程序员做大量的艰苦工作。

14621 次浏览

You can increase the recursion limit using:

import sys


sys.setrecursionlimit(new_limit)

But note that this limit exists for a reason and that pure Python is not optimized for recursion (and compute-intensive tasks in general).

The issue is that Python has an internal limit on number of recursive function calls.

That limit is configurable as shown in Quentin Coumes' answer. However, too deep a function chain will result in a stack overflow. This underlying limitation¹ applies to both C++ and Python. This limitation also applies to all function calls, not just recursive ones.

In general: You should not write² algorithms that have recursion depth growth with linear complexity or worse. Logarithmically growing recursion is typically fine. Tail-recursive functions are trivial to re-write as iterations. Other recursions may be converted to iteration using external data structures (usually, a dynamic stack).

A related rule of thumb is that you shouldn't have large objects with automatic storage. This is C++-specific since Python doesn't have the concept of automatic storage.


¹ The underlying limitation is the execution stack size. The default size differs between systems, and different function calls consume different amounts of memory, so the limit isn't specified as a number of calls but in bytes instead. This too is configurable on some systems. I wouldn't typically recommend touching that limit due to portability concerns.

² Exceptions to this rule of thumb are certain functional languages that guarantee tail-recursion elimination - such as Haskell - where that rule can be relaxed in case of recursions that are guaranteed to be eliminated. Python is not such a language, and the function in question isn't tail-recursive. While C++ compilers can perform the elimination as an optimization, it isn't guaranteed, and is typically not optimized in debug builds. Hence, the exception doesn't generally apply to C++ either.

Disclaimer: The following is my hypothesis; I don't actually know their rationale: The Python limit is probably a feature that detects potentially infinite recursions, preventing likely unsafe stack overflow crashes and substituting a more controlled RecursionError.

Why are recursive calls so much cheaper in C++?

C++ is a compiled language. Python is interpreted. (Nearly) everything is cheaper in C++, except the translation from source code to an executable program.

Let me first answer your direct questions:

Why are recursive calls so much cheaper in C++?

Because C++ has no limitation on recursive call depth, except the size of the stack. And being a fully compiled language, loops (including recursion) are much faster in C++ than in Python (the reason why special Python modules like numpy/scipy directly use C routines). Additionally, most C++ implementations use a special feature called tail recursion elimination (see later in this post) and optimize some recursive code into iterative equivalents. This is nice here but not guaranteed by the standard, so other compilations could result in a program crashing miserably - but tail recursion is probably not involved here.

If the recursion is too deep and exhausts the available stack, you will invoke the well-known Undefined Behaviour where anything can happen, from an immediate crash to a program giving wrong results (IMHO the latter is much worse and cannot be detected...)

Can we somehow modify the Python compiler to make it more receptive to recursion?

No. Python implementation explicitly never uses tail recursion elimination. You could increase the recursion limit, but this is almost always a bad idea (see later in this post why).

Now for the true explanation of the underlying rationale.

Deep recursion is evil, full stop. You should never use it. Recursion is a handy tool when you can make sure that the depth will stay in sane limits. Python uses a soft limit to warn the programmer that something is going wrong before crashing the system. On the other hand, optimizing C and C++ compilers often internally change tail recursion into an iterative loop. But relying on it is highly dangerous because a slight change could prevent that optimization and cause an application crash.

As found in this other SO post, common Python implementations do not implement that tail recursion elimination. So you should not use recursion at a 5000 depth but instead use an iterative algorithm.

As your underlying computation will need all Fibonacci numbers up to the specified one, it is not hard to iteratively compute them. Furthermore, it will be much more efficient!

A solution is a trampoline: the recursive function, instead of calling another function, returns a function that makes that call with the appropriate arguments. There's a loop one level higher that calls all those functions in a loop until we have the final result. I'm probably not explaining it very well; you can find resources online that do a better job.

The point is that this converts recursion to iteration. I don't think this is faster, maybe it's even slower, but the recursion depth stays low.

An implementation could look like below. I split the pair x into a and b for clarity. I then converted the recursive function to a version that keeps track of a and b as arguments, making it tail recursive.

def fib_acc(n, a, b):
if n == 1:
return (a, b)
return lambda: fib_acc(n - 1, (a+b) % 997, (a+2*b) % 997)


def fib(n):
x = fib_acc(n, 1, 2)
while callable(x):
x = x()
return x


if __name__=='__main__':
print(fib(50000)[0])

"Both will find the answer 996 without issues"

I do see at least one issue : the answer should be 836, not 996.

It seems that both your functions calculate Fibonacci(2*n) % p, and not Fibonacci(n) % p.

996 is the result of Fibonacci(1000) % 997.

Pick a more efficient algorithm

An inefficient algorithm stays an inefficient algorithm, regardless if it's written in C++ or Python.

In order to compute large Fibonacci numbers, there are much faster methods than simple recursion with O(n) calls (see related article).

For large n, this recursive O(log n) Python function should run in circles around your above C++ code:

from functools import lru_cache


@lru_cache(maxsize=None)
def fibonacci(n, p):
"Calculate Fibonacci(n) modulo p"
if n < 3:
return [0, 1, 1][n]
if n % 2 == 0:
m = n // 2
v1 = fibonacci(m - 1, p)
v2 = fibonacci(m, p)
return (2*v1 + v2) * v2 % p
else:
m = (n + 1) // 2
v1 = fibonacci(m, p) ** 2
v2 = fibonacci(m - 1, p) ** 2
return (v1 + v2) % p




print(fibonacci(500, 997))
#=> 836
print(fibonacci(1000, 997))
#=> 996

Try it online!

It will happily calculate fibonacci(10_000_000_000_000_000, 997).

It's possible to add recursion level as parameter, in order to see how deep the recursion needs to go, and display it with indentation. Here's an example for n=500:

# Recursion tree:
500
249
124
61
30
14
6
2
3
1
2
7
4
15
8
31
16
62
125
63
32
250

Try it online!

Your examples would simply look like very long diagonals:

500
499
498
...
...
1

For Windows executables, the stack size is specified in the header of the executable. For the Windows version of Python 3.7 x64, that size is 0x1E8480 or exactly 2.000.000 bytes.

That version crashes with

Process finished with exit code -1073741571 (0xC00000FD)

and if we look that up we find that it's a Stack Overflow.

What we can see on the (native) stack with a native debugger like WinDbg (enable child process debugging) is

[...]
fa 000000e9`6da1b680 00007fff`fb698a6e     python37!PyArg_UnpackStack+0x371
fb 000000e9`6da1b740 00007fff`fb68b841     python37!PyEval_EvalFrameDefault+0x73e
fc 000000e9`6da1b980 00007fff`fb698a6e     python37!PyArg_UnpackStack+0x371
fd 000000e9`6da1ba40 00007fff`fb68b841     python37!PyEval_EvalFrameDefault+0x73e
fe 000000e9`6da1bc80 00007fff`fb698a6e     python37!PyArg_UnpackStack+0x371
ff 000000e9`6da1bd40 00007fff`fb68b841     python37!PyEval_EvalFrameDefault+0x73e


2:011> ? 000000e9`6da1bd40 - 000000e9`6da1ba40
Evaluate expression: 768 = 00000000`00000300

So Python will use 2 Stack frames per method call and there's an enormous 768 bytes difference in the stack positions.

If you modify that value inside the EXE (make a backup copy) with a hex editor to, let's say 256 MB

Python.exe edited in 010 Editor

you can run the following code

[...]
if __name__=='__main__':
sys.setrecursionlimit(60000)
print(fib(50000)[0])

and it will give 151 as the answer.


In C++, we can also force a Stack Overflow, e.g. by passing 500.000 as the parameter. While debugging, we get

0:000> .exr -1
ExceptionAddress: 00961015 (RecursionCpp!fib+0x00000015)
ExceptionCode: c00000fd (Stack overflow)
[...]
0:000> k
[...]
fc 00604f90 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7]
fd 00604fb0 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7]
fe 00604fd0 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7]
ff 00604ff0 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7]


0:000> ? 00604ff0 - 00604fd0
Evaluate expression: 32 = 00000020

which is just 1 stack frame per method call and only 32 bytes difference on the stack. Compared to Python, C++ can do 768/32 = 24x more recursions for the same stack size.

My Microsoft compiler created the executable with the default stack size of 1 MB (Release build, 32 Bit):

010 Editor for C++ executable

The 64 bit version has a stack difference of 64 bit (also Release build).


Tools used:

An alternative to a trampoline is to use reduce. if you can change the recursive function to be tail-recursive, you can implement it with reduce, here's a possible implementation.

reduce is internally implemented iteratively, so you get to use your recursive function without blowing up the stack.

def inner_fib(acc, num):
# acc is a list of two values
return [(acc[0]+acc[1]) % 997, (acc[0]+2*acc[1]) % 997]


def fib(n):
return reduce(inner_fib,
range(2, n+1), # start from 2 since n=1 is covered in the base case
[1,2])        # [1,2] is the base case