为什么递归调用会导致不同堆栈深度的 StackOverflow?

我试图弄清楚 C # 编译器是如何处理尾部调用的。

(答案: 他们不是。但是 64位 JIT (s)会做 TCE (尾部呼叫消除)

因此,我使用一个递归调用编写了一个小测试,它输出在 StackOverflowException终止进程之前调用了多少次。

class Program
{
static void Main(string[] args)
{
Rec();
}


static int sz = 0;
static Random r = new Random();
static void Rec()
{
sz++;


//uncomment for faster, more imprecise runs
//if (sz % 100 == 0)
{
//some code to keep this method from being inlined
var zz = r.Next();
Console.Write("{0} Random: {1}\r", sz, zz);
}
        

//uncommenting this stops TCE from happening
//else
//{
//    Console.Write("{0}\r", sz);
//}


Rec();
}

就在这时,程序在任何一个上以 SO Exception 结束:

  • “优化构建”关闭(调试或发布)
  • 目标: x86
  • 目标: AnyCPU + “ Prefer32 bit”(这是 VS2012中的新版本,也是我第一次看到它。 这里还有)
  • 代码中一些看似无关紧要的分支(请参见注释中的“ else”分支)。

相反,使用“优化构建”ON + (Target = x64或 AnyCPU 与“偏好32位”关闭(在64位 CPU 上)) ,TCE 发生,计数器永远不停地旋转(好吧,它可以说旋转 放下每次其值溢出)。

但是我注意到在 StackOverflowException案例中有一个我无法解释的行为: 它从不(?)在 没错发生相同的堆栈深度。下面是一些32位运行的输出,发布版本:

51600 Random: 1778264579
Process is terminated due to StackOverflowException.


51599 Random: 1515673450
Process is terminated due to StackOverflowException.


51602 Random: 1567871768
Process is terminated due to StackOverflowException.


51535 Random: 2760045665
Process is terminated due to StackOverflowException.

调试版本:

28641 Random: 4435795885
Process is terminated due to StackOverflowException.


28641 Random: 4873901326  //never say never
Process is terminated due to StackOverflowException.


28623 Random: 7255802746
Process is terminated due to StackOverflowException.


28669 Random: 1613806023
Process is terminated due to StackOverflowException.

堆栈大小不变(默认值为1MB)。堆栈帧的大小不变。

那么,什么可以解释当 StackOverflowException到达时堆栈深度(有时是非平凡的)的变化呢?

更新

Hans Passant 提出了 Console.WriteLine涉及 P/Invoke、互操作和可能的非确定性锁的问题。

所以我把代码简化成这样:

class Program
{
static void Main(string[] args)
{
Rec();
}
static int sz = 0;
static void Rec()
{
sz++;
Rec();
}
}

我在没有调试器的情况下在32位发布/优化运行它。当程序崩溃时,我附加调试器并检查计数器的值。

而且它的 还是在几次运行中都不一样。(或者我的测试有缺陷。)

更新: 结束

按照 fejesjoco 的建议,我查了一下 ASLR (位址空间配置随机载入)。

这是一种安全技术,通过随机化进程地址空间中的各种东西,包括堆栈位置和显然的大小,使得缓冲区溢出攻击很难找到(例如)特定系统调用的精确位置。

这个理论听起来不错。让我们把它付诸实践吧!

为了测试这一点,我使用了一个特定于此任务的 Microsoft 工具: EMET 或增强缓解经验工具包。它允许在系统或进程级别设置 ASLR 标志(以及更多)。
(还有一个我没有试过的 系统范围的注册表黑客替代方案)

EMET GUI

为了验证该工具的有效性,我还发现 进程资源管理器在过程的“属性”页面中适时地报告了 ASLR 标志的状态。直到今天才知道

enter image description here

理论上,EMET 可以(重新)为单个进程设置 ASLR 标志。实际上,它似乎没有改变任何东西(见上图)。

然而,我为整个系统禁用了 ASLR,(一次重启后)我终于可以确认,SO 异常现在总是发生在相同的堆栈深度。

额外奖励

与 ASLR 相关,旧新闻: Chrome 是怎么被打败的

3174 次浏览

I think it may be ASLR at work. You can turn off DEP to test this theory.

See here for a C# utility class to check memory information: https://stackoverflow.com/a/8716410/552139

By the way, with this tool, I found that the difference between the maximum and minimum stack size is around 2 KiB, which is half a page. That's weird.

Update: OK, now I know I'm right. I followed up on the half-page theory, and found this doc that examines the ASLR implementation in Windows: http://www.symantec.com/avcenter/reference/Address_Space_Layout_Randomization.pdf

Quote:

Once the stack has been placed, the initial stack pointer is further randomized by a random decremental amount. The initial offset is selected to be up to half a page (2,048 bytes)

And this is the answer to your question. ASLR takes away between 0 and 2048 bytes of your initial stack randomly.

Change r.Next() to r.Next(10). StackOverflowExceptions should occur in the same depth.

Generated strings should consume the same memory because they have the same size. r.Next(10).ToString().Length == 1 always. r.Next().ToString().Length is variable.

The same applies if you use r.Next(100, 1000)

This C++11 code prints the offset of the stack within the start page:

#include <Windows.h>
#include <iostream>


using namespace std;


#if !defined(__llvm__)
#pragma warning(disable: 6387) // handle could be NULL
#pragma warning(disable: 6001) // using uninitialized memory
#endif


int main()
{
SYSTEM_INFO si;
GetSystemInfo( &si );
static atomic<size_t> aPageSize = si.dwPageSize;
auto theThread = []( LPVOID ) -> DWORD
{
size_t pageSize = aPageSize.load( memory_order_relaxed );
return (DWORD)(pageSize - ((size_t)&pageSize & pageSize - 1));
};
constexpr unsigned ROUNDS = 10;
for( unsigned r = ROUNDS; r--; )
{
HANDLE hThread = CreateThread( nullptr, 0, theThread, nullptr, 0, nullptr );
WaitForSingleObject( hThread, INFINITE );
DWORD dwExit;
GetExitCodeThread( hThread, &dwExit );
CloseHandle( hThread );
cout << dwExit << endl;
}
}

Linux doesn't randomize the lower 12 bits per default:

#include <iostream>
#include <atomic>
#include <pthread.h>
#include <unistd.h>


using namespace std;


int main()
{
static atomic<size_t> aPageSize = sysconf( _SC_PAGESIZE );
auto theThread = []( void *threadParam ) -> void *
{
size_t pageSize = aPageSize.load( memory_order_relaxed );
return (void *)(pageSize - ((size_t)&pageSize & pageSize - 1));
};
constexpr unsigned ROUNDS = 10;
for( unsigned r = ROUNDS; r--; )
{
pthread_t pThread;
pthread_create( &pThread, nullptr, theThread, nullptr );
void *retVal;
pthread_join( pThread, &retVal );
cout << (size_t)retVal << endl;
}
}

The issue here is that randomizing the thread stack's starting address within a page doesn't make sense from a security standpoint. The issue is simply that when you have a 64 bit system with 47 bit userspace (on newer Intel-CPUs you even have 55 bit userspace) you have still 35 bits to randomize, i.e. about 34 billion placements of a stack. And it doesn't make sense from a performance standpoint either since cacheline aliasing on SMT-systems can't happen because caches have enough associativity today.