EBP 帧指针寄存器的用途是什么?

我是一个汇编语言的初学者,我注意到编译器发出的 x86代码通常会保留框架指针,即使是在发布/优化模式下,当它可以使用 EBP寄存器做其他事情时也是如此。

我理解为什么框架指针可能使代码更容易调试,而且如果在函数中调用 alloca(),可能是必要的。然而,x86只有很少的寄存器,在一个寄存器就足够的情况下,使用两个寄存器来保存堆栈帧的位置对我来说没有意义。为什么忽略框架指针被认为是一个坏主意,即使在优化/发布版本?

58668 次浏览

It depends on the compiler, certainly. I've seen optimized code emitted by x86 compilers that freely uses the EBP register as a general purpose register. (I don't recall which compiler I noticed that with, though.)

Compilers may also choose to maintain the EBP register to assist with stack unwinding during exception handling, but again this depends on the precise compiler implementation.

Using stack frames has gotten incredibly cheap in any hardware even remotely modern. If you have cheap stack frames then saving a couple of registers isn't as important. I'm sure fast stack frames vs. more registers was an engineering trade-off, and fast stack frames won.

How much are you saving going pure register? Is it worth it?

Frame pointer is a reference pointer allowing a debugger to know where local variable or an argument is at with a single constant offset. Although ESP's value changes over the course of execution, EBP remains the same making it possible to reach the same variable at the same offset (such as first parameter will always be at EBP+8 while ESP offsets can change significantly since you'll be pushing/popping things)

Why don't compilers throw away frame pointer? Because with frame pointer, the debugger can figure out where local variables and arguments are using the symbol table since they are guaranteed to be at a constant offset to EBP. Otherwise there isn't an easy way to figure where a local variable is at any point in code.

As Greg mentioned, it also helps stack unwinding for a debugger since EBP provides a reverse linked list of stack frames therefore letting the debugger to figure out size of stack frame (local variables + arguments) of the function.

Most compilers provide an option to omit frame pointers although it makes debugging really hard. That option should never be used globally, even in release code. You don't know when you'll need to debug a user's crash.

However, x86 has very few registers

This is true only in the sense that opcodes can only address 8 registers. The processor itself will actually have many more registers than that and use register renaming, pipelining, speculative execution, and other processor buzzwords to get around that limit. Wikipedia has a good introductory paragraph as to what an x86 processor can do to overcome the register limit: http://en.wikipedia.org/wiki/X86#Current_implementations.

Just adding my two cents to already good answers.

It's part of a good language architecture to have a chain of stack frames. The BP points to the current frame, where subroutine-local variables are stored. (Locals are at negative offsets, and arguments are at positive offsets.)

The idea that it is preventing a perfectly good register from being used in optimization raises the question: when and where is optimization actually worthwhile?

Optimization is only worthwhile in tight loops that 1) do not call functions, 2) where the program counter spends a significant fraction of its time, and 3) in code the compiler actually will ever see (i.e. non-library functions). This is usually a very small fraction of the overall code, especially in large systems.

Other code can be twisted and squeezed to get rid of cycles, and it simply won't matter, because the program counter is practically never there.

I know you didn't ask this, but in my experience, 99% of performance problems have nothing at all to do with compiler optimization. They have everything to do with over-design.