为什么海湾合作委员会垫功能与 NOP?

我在 C 公司工作了一段时间,最近开始进入 ASM 公司。编译程序时:

int main(void)
{
int a = 0;
a += 1;
return 0;
}

Objecdump 反汇编包含代码,但在 ret 之后为 nops:

...
08048394 <main>:
8048394:       55                      push   %ebp
8048395:       89 e5                   mov    %esp,%ebp
8048397:       83 ec 10                sub    $0x10,%esp
804839a:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
80483a1:       83 45 fc 01             addl   $0x1,-0x4(%ebp)
80483a5:       b8 00 00 00 00          mov    $0x0,%eax
80483aa:       c9                      leave
80483ab:       c3                      ret
80483ac:       90                      nop
80483ad:       90                      nop
80483ae:       90                      nop
80483af:       90                      nop
...

据我所知,反对派什么都不做,而且从那以后,雷特甚至都不会被处决。

我的问题是: 为什么这么麻烦? ELF (linux-x86)不能使用任何大小的. text 部分(+ main)吗?

我很感激你的帮助,我只是想学习。

10216 次浏览

As far as I remember, instructions are pipelined in cpu and different cpu blocks (loader, decoder and such) process subsequent instructions. When RET instructions is being executed, few next instructions are already loaded into cpu pipeline. It's a guess, but you can start digging here and if you find out (maybe the specific number of NOPs that are safe, share your findings please.

This is done to align the next function by 8, 16 or 32-byte boundary.

From “Optimizing subroutines in assembly language” by A.Fog:

11.5 Alignment of code

Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.

[...]

Aligning a subroutine entry is as simple as putting as many NOP 's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.

First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:

-falign-functions
-falign-functions=n

Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance, -falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only if this can be done by skipping 23 bytes or less.

-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.

Some assemblers only support this flag when n is a power of two; in that case, it is rounded up.

If n is not specified or is zero, use a machine-dependent default.

Enabled at levels -O2, -O3.

There could be multiple reasons for doing this, but the main one on x86 is probably this:

Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be advantageous to align critical loop entries and subroutine entries by 16 in order to minimize the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.

(Quoted from "Optimizing subroutines in assembly language" by Agner Fog.)

edit: Here is an example that demonstrates the padding:

// align.c
int f(void) { return 0; }
int g(void) { return 0; }

When compiled using gcc 4.4.5 with default settings, I get:

align.o:     file format elf64-x86-64


Disassembly of section .text:


0000000000000000 <f>:
0:   55                      push   %rbp
1:   48 89 e5                mov    %rsp,%rbp
4:   b8 00 00 00 00          mov    $0x0,%eax
9:   c9                      leaveq
a:   c3                      retq


000000000000000b <g>:
b:   55                      push   %rbp
c:   48 89 e5                mov    %rsp,%rbp
f:   b8 00 00 00 00          mov    $0x0,%eax
14:   c9                      leaveq
15:   c3                      retq

Specifying -falign-functions gives:

align.o:     file format elf64-x86-64


Disassembly of section .text:


0000000000000000 <f>:
0:   55                      push   %rbp
1:   48 89 e5                mov    %rsp,%rbp
4:   b8 00 00 00 00          mov    $0x0,%eax
9:   c9                      leaveq
a:   c3                      retq
b:   eb 03                   jmp    10 <g>
d:   90                      nop
e:   90                      nop
f:   90                      nop


0000000000000010 <g>:
10:   55                      push   %rbp
11:   48 89 e5                mov    %rsp,%rbp
14:   b8 00 00 00 00          mov    $0x0,%eax
19:   c9                      leaveq
1a:   c3                      retq