(A + B + C)≠(A + C + B)与编译器重排序

添加两个32位整数会导致整数溢出:

uint64_t u64_z = u32_x + u32_y;

如果首先将32位整数之一强制转换或添加到64位整数中,则可以避免此溢出。

uint64_t u64_z = u32_x + u64_a + u32_y;

但是,如果编译器决定重新排序加法:

uint64_t u64_z = u32_x + u32_y + u64_a;

整数溢出仍有可能发生。

是否允许编译器进行这种重新排序,或者我们可以相信它们会注意到结果的不一致性并保持表达式的顺序?

9911 次浏览

If the optimiser does such a reordering it is still bound to the C specification, so such a reordering would become:

uint64_t u64_z = (uint64_t)u32_x + (uint64_t)u32_y + u64_a;

Rationale:

We start with

uint64_t u64_z = u32_x + u64_a + u32_y;

Addition is performed left-to-right.

The integer promotion rules state that in the first addition in the original expression, u32_x be promoted to uint64_t. In the second addition, u32_y will also be promoted to uint64_t.

So, in order to be compliant with the C specification, any optimiser must promote u32_x and u32_y to 64 bit unsigned values. This is equivalent to adding a cast. (The actual optimising is not done at the C level, but I use C notation because that is a notation that we understand.)

There is the "as if" rule in C, C++, and Objective-C: The compiler may do whatever it likes as long as no conforming program can tell the difference.

In these languages, a + b + c is defined to be the same as (a + b) + c. If you can tell the difference between this and for example a + (b + c) then the compiler cannot change the order. If you can't tell the difference, then the compiler is free to change the order, but that's fine, because you can't tell the difference.

In your example, with b = 64 bit, a and c 32 bit, the compiler would be allowed to evaluate (b + a) + c or even (b + c) + a, because you couldn't tell the difference, but not (a + c) + b because you can tell the difference.

In other words, the compiler isn't allowed to do anything that makes your code behave different from what it should. It is not required to produce the code that you think it would produce, or that you think it should produce, but the code will give you exactly the results it should.

A compiler is only allowed to re-order under the as if rule. That is, if the reordering will always give the same result as the specified ordering, then it is allowed. Otherwise (as in your example), not.

For example, given the following expression

i32big1 - i32big2 + i32small

which has been carefully constructed to subtract the two values which are known to be large but similar, and only then add the other small value (thus avoiding any overflow), the compiler may choose to reorder into:

(i32small - i32big2) + i32big1

and rely on the fact that the target platform is using two-complement arithmetic with wrap-round to prevent problems. (Such a reordering might be sensible if the compiler is pressed for registers, and happens to have i32small in a register already).

Are compilers allowed to do such a reordering or can we trust them to notice the result inconsistency and keep the expression order as is?

The compiler can reorder only if it gives the same result - here, as you observed, it doesn't.


It's possible to write a function template, if you want one, which promotes all arguments to std::common_type before adding - this would be safe, and not rely on either argument order or manual casting, but it's pretty clunky.

It depends on bit width of unsigned/int.

The below 2 are not the same (when unsigned <= 32 bits). u32_x + u32_y becomes 0.

u64_a = 0; u32_x = 1; u32_y = 0xFFFFFFFF;
uint64_t u64_z = u32_x + u64_a + u32_y;
uint64_t u64_z = u32_x + u32_y + u64_a;  // u32_x + u32_y carry does not add to sum.

They are the same (when unsigned >= 34 bits). Integer promotions caused u32_x + u32_y addition to occur at 64-bit math. Order is irrelevant.

It is UB (when unsigned == 33 bits). Integer promotions caused addition to occur at signed 33-bit math and signed overflow is UB.

Are compilers allowed to do such a reordering ...?

(32 bit math): Re-order yes, but same results must occur, so not that re-ordering OP proposes. Below are the same

// Same
u32_x + u64_a + u32_y;
u64_a + u32_x + u32_y;
u32_x + (uint64_t) u32_y + u64_a;
...


// Same as each other below, but not the same as the 3 above.
uint64_t u64_z = u32_x + u32_y + u64_a;
uint64_t u64_z = u64_a + (u32_x + u32_y);

... can we trust them to notice the result inconsistency and keep the expression order as is?

Trust yes, but OP's coding goal is not crystal clear. Should u32_x + u32_y carry contribute? If OP wants that contribution, code should be

uint64_t u64_z = u64_a + u32_x + u32_y;
uint64_t u64_z = u32_x + u64_a + u32_y;
uint64_t u64_z = u32_x + (u32_y + u64_a);

But not

uint64_t u64_z = u32_x + u32_y + u64_a;

Quoting from the standards:

[ Note: Operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.7 For example, in the following fragment int a, b;

/∗ ... ∗/
a = a + 32760 + b + 5;

the expression statement behaves exactly the same as

a = (((a + 32760) + b) + 5);

due to the associativity and precedence of these operators. Thus, the result of the sum (a + 32760) is next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in which overflows produce an exception and in which the range of values representable by an int is [-32768,+32767], the implementation cannot rewrite this expression as

a = ((a + b) + 32765);

since if the values for a and b were, respectively, -32754 and -15, the sum a + b would produce an exception while the original expression would not; nor can the expression be rewritten either as

a = ((a + 32765) + b);

or

a = (a + (b + 32765));

since the values for a and b might have been, respectively, 4 and -8 or -17 and 12. However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. — end note ]