X86_64注册 rax/eax/ax/al 重写完整注册表内容

正如广泛宣传的那样，现代的 x86 _ 64处理器有64位寄存器，可以作为32位寄存器、16位寄存器甚至8位寄存器以向后兼容的方式使用，例如:

0x1122334455667788
================ rax (64 bits)
======== eax (32 bits)
====  ax (16 bits)
==    ah (8 bits)
==  al (8 bits)

这种方案可按字面意思理解，即只能使用指定名称进入登记册的某一部分，以便阅读或书写，这是非常合乎逻辑的。事实上，对于32位以下的所有内容都是如此:

mov  eax, 0x11112222 ; eax = 0x11112222
mov  ax, 0x3333      ; eax = 0x11113333 (works, only low 16 bits changed)
mov  al, 0x44        ; eax = 0x11113344 (works, only low 8 bits changed)
mov  ah, 0x55        ; eax = 0x11115544 (works, only high 8 bits changed)
xor  ah, ah          ; eax = 0x11110044 (works, only high 8 bits cleared)
mov  eax, 0x11112222 ; eax = 0x11112222
xor  al, al          ; eax = 0x11112200 (works, only low 8 bits cleared)
mov  eax, 0x11112222 ; eax = 0x11112222
xor  ax, ax          ; eax = 0x11110000 (works, only low 16 bits cleared)

然而，一旦我们开始使用64位的东西，事情似乎就变得相当尴尬了:

mov  rax, 0x1111222233334444 ;           rax = 0x1111222233334444
mov  eax, 0x55556666         ; actual:   rax = 0x0000000055556666
; expected: rax = 0x1111222255556666
; upper 32 bits seem to be lost!
mov  rax, 0x1111222233334444 ;           rax = 0x1111222233334444
mov  ax, 0x7777              ;           rax = 0x1111222233337777 (works!)
mov  rax, 0x1111222233334444 ;           rax = 0x1111222233334444
xor  eax, eax                ; actual:   rax = 0x0000000000000000
; expected: rax = 0x1111222200000000
; again, it wiped whole register

在我看来，这种行为是非常荒谬和不合逻辑的。它看起来像试图写任何东西到 eax通过任何手段导致擦除高32位的 rax寄存器。

我有两个问题:

我相信这种尴尬的行为一定被记录在某个地方，但是我似乎找不到任何详细的解释(关于64位的32位寄存器究竟是如何被清除的)。写入 eax总是擦除 rax，我说的对吗? 或者是更复杂的东西？它是否适用于所有64位寄存器，还是有一些例外？

非常相关的问题提到了同样的行为，但是，唉，同样没有对文档的确切引用。

换句话说，我想要一个指定这种行为的文档链接。
是只有我这么觉得，还是这整件事看起来真的很奇怪和不合逻辑(比如，一个行为一个行为，一个行为一个行为) ？也许我在这里漏掉了一些关键点，为什么要这样实施呢？

如能解释一下“为什么”我将不胜感激。

68502

小开

最佳答案

The processor model as documented in the Intel/AMD processor manual is a pretty imperfect model for the real execution engine of a modern core. In particular, the notion of the processor registers does not match reality, there is no such thing as a EAX or RAX register.

One primary job of the instruction decoder is to convert the legacy x86/x64 instructions into micro-ops, instructions of a RISC-like processor. Small instructions that are easy to execute concurrently and being able to take advantage of multiple execution sub-units. Allowing as many as 6 instructions to execute at the same time.

To make that work, the notion of processor registers is virtualized as well. The instruction decoder allocates a register from a big bank of registers. When the instruction is retired, the value of that dynamically allocated register is written back to whatever register currently holds the value of, say, RAX.

To make that work smoothly and efficiently, allowing many instructions to execute concurrently, it is very important that these operations don't have an interdependency. And the worst kind you can have is that the register value depends on other instructions. The EFLAGS register is notorious, many instructions modify it.

Same problem with the way you like it to work. Big problem, it requires two register values to be merged when the instruction is retired. Creating a data dependency that's going to clog up the core. By forcing the upper 32-bit to 0, that dependency instantly disappears, no longer a need to merge. Warp 9 execution speed.