“ rep; nop;”在 x86汇编中是什么意思? 它是否与“ stop”指令相同?

  • rep; nop是什么意思?
  • 它是否与 pause指令相同?
  • 它和 rep nop(没有分号)一样吗?
  • 与简单的 nop指令有什么区别?
  • 它在 AMD 和 Intel 处理器上的表现是否有所不同?
  • (奖金)这些指示的官方文件在哪里?

这个问题的动机

在对 另一个问题的注释进行了一些讨论之后,我意识到我不知道 rep; nop;在 x86(或 x86-64)汇编中意味着什么。而且我在网上也找不到合适的解释。

我知道 rep是一个前缀,意思是 cx1(或者至少曾经是,在旧的16位 x86汇编中)。根据这个 cx2,似乎 rep只能用于 movsstoscmpslodsscas(但是这个限制可能在新的处理器上被移除了)。因此,我认为 rep nop(没有分号)将重复 nop操作 cx次。

然而,经过进一步的寻找,我变得更加困惑。似乎 rep; noppause 映射到完全相同的操作码,和 pause有一点不同的行为,而不仅仅是 nop。一些 2005年的旧邮件表达了不同的观点:

  • “尽量不要消耗太多能量”
  • “它相当于只有2字节编码的‘ nop’”
  • “这就是情报的魔力,就像是‘不行,让另一个 HT 兄弟跑’”
  • “这是暂停的情报和快速填补,在阿森纳”

由于这些不同的意见,我不能理解正确的意思。

它正在 Linux 内核中使用(同时在 I386X86 _ 64上) ,以及这个注释: /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */它也是 正在使用 BerRTOS,具有相同的注释。

28664 次浏览

rep; nop is indeed the same as the pause instruction (opcode F390). It might be used for assemblers which don't support the pause instruction yet. On previous processors, this simply did nothing, just like nop but in two bytes. On new processors which support hyperthreading, it is used as a hint to the processor that you are executing a spinloop to increase performance. From Intel's instruction reference:

Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.

rep nop = F3 90 = the encoding for pause, as well as how it decodes on older CPUs that don't support pause.


Prefixes (other than lock) that don't apply to an instruction are ignored in practice by existing CPUs.

The documentation says using rep with instructions it doesn't apply to is "reserved and can cause unpredictable behaviour" because future CPUs might recognize it as part of some new instruction. Once they establish any specific new instruction encoding using f3 xx, they document how it runs on older CPUs. (Yes, the x86 opcode space is so limited that they do crazy stuff like this, and yes it makes the decoders complicated.)

In this case, it means you can use pause in spinloops without breaking backwards compat. Old CPUs that don't know about pause will decode it as a NOP with no harm done, as guaranteed by Intel's ISA ref manual entry for pause. On new CPUs, you get the benefit of power-saving / HT friendliness, and avoiding memory-ordering mis-speculation when the memory you're spinning on does change and you leave the spin loop.


Links to Intel's manuals and tons of other good stuff on the x86 tag wiki info page

Another case of a meaningless rep prefix becoming a new instruction on new CPUs: lzcnt is F3 0F BD /r. On CPUs that don't support that instruction (missing the LZCNT feature flag in their CPUID), it decodes as rep bsr, which runs the same as bsr. So on old CPUs, it produces 32 - expected_result, and is undefined when the input was zero.

But tzcnt and bsf do the same thing with non-zero inputs, so compilers can and do use tzcnt even when it's not guaranteed that the target CPU will run it as tzcnt. AMD CPUs have fast tzcnt, slow bsf, and on Intel they're both fast. As long as it doesn't matter for correctness (you're not relying on flag-setting, or on leaving the destination unmodified behaviour in the input=0 case), having it decode as tzcnt on CPUs that support it is helpful.


One case of a meaningless rep prefix that will probably never decode differently: rep ret is used by default by gcc when targeting "generic" CPUs (i.e. not targetting a specific CPU with -march or -mtune, and not targetting AMD K8 or K10.) It will be decades before anyone could make a CPU that decodes rep ret as anything other than ret, because it's present in most binaries in most Linux distros. See What does `rep ret` mean?