为什么递归调用会导致不同堆栈深度的 StackOverflow？

小开

最佳答案

I think it may be ASLR at work. You can turn off DEP to test this theory.

See here for a C# utility class to check memory information: https://stackoverflow.com/a/8716410/552139

By the way, with this tool, I found that the difference between the maximum and minimum stack size is around 2 KiB, which is half a page. That's weird.

Update: OK, now I know I'm right. I followed up on the half-page theory, and found this doc that examines the ASLR implementation in Windows: http://www.symantec.com/avcenter/reference/Address_Space_Layout_Randomization.pdf

Quote:

Once the stack has been placed, the initial stack pointer is further randomized by a random decremental amount. The initial offset is selected to be up to half a page (2,048 bytes)

And this is the answer to your question. ASLR takes away between 0 and 2048 bytes of your initial stack randomly.

小开

Change r.Next() to r.Next(10). StackOverflowExceptions should occur in the same depth.

Generated strings should consume the same memory because they have the same size. r.Next(10).ToString().Length == 1 always. r.Next().ToString().Length is variable.

The same applies if you use r.Next(100, 1000)

小开

This C++11 code prints the offset of the stack within the start page:

#include <Windows.h>
#include <iostream>


using namespace std;


#if !defined(__llvm__)
#pragma warning(disable: 6387) // handle could be NULL
#pragma warning(disable: 6001) // using uninitialized memory
#endif


int main()
{
SYSTEM_INFO si;
GetSystemInfo( &si );
static atomic<size_t> aPageSize = si.dwPageSize;
auto theThread = []( LPVOID ) -> DWORD
{
size_t pageSize = aPageSize.load( memory_order_relaxed );
return (DWORD)(pageSize - ((size_t)&pageSize & pageSize - 1));
};
constexpr unsigned ROUNDS = 10;
for( unsigned r = ROUNDS; r--; )
{
HANDLE hThread = CreateThread( nullptr, 0, theThread, nullptr, 0, nullptr );
WaitForSingleObject( hThread, INFINITE );
DWORD dwExit;
GetExitCodeThread( hThread, &dwExit );
CloseHandle( hThread );
cout << dwExit << endl;
}
}

Linux doesn't randomize the lower 12 bits per default:

#include <iostream>
#include <atomic>
#include <pthread.h>
#include <unistd.h>


using namespace std;


int main()
{
static atomic<size_t> aPageSize = sysconf( _SC_PAGESIZE );
auto theThread = []( void *threadParam ) -> void *
{
size_t pageSize = aPageSize.load( memory_order_relaxed );
return (void *)(pageSize - ((size_t)&pageSize & pageSize - 1));
};
constexpr unsigned ROUNDS = 10;
for( unsigned r = ROUNDS; r--; )
{
pthread_t pThread;
pthread_create( &pThread, nullptr, theThread, nullptr );
void *retVal;
pthread_join( pThread, &retVal );
cout << (size_t)retVal << endl;
}
}

The issue here is that randomizing the thread stack's starting address within a page doesn't make sense from a security standpoint. The issue is simply that when you have a 64 bit system with 47 bit userspace (on newer Intel-CPUs you even have 55 bit userspace) you have still 35 bits to randomize, i.e. about 34 billion placements of a stack. And it doesn't make sense from a performance standpoint either since cacheline aliasing on SMT-systems can't happen because caches have enough associativity today.

为什么递归调用会导致不同堆栈深度的 StackOverflow？

更新

更新: 结束

额外奖励