Is it allowed for a compiler to optimize away a local volatile variable?

Is the compiler allowed to optimize this (according to the C++17 standard):

int fn() {
volatile int x = 0;
return x;
}

to this?

int fn() {
return 0;
}

If yes, why? If not, why not?


Here's some thinking about this subject: current compilers compile fn() as a local variable put on the stack, then return it. For example, on x86-64, gcc creates this:

mov    DWORD PTR [rsp-0x4],0x0 // this is x
mov    eax,DWORD PTR [rsp-0x4] // eax is the return register
ret

Now, as far as I know the standard doesn't say that a local volatile variable should be put on the stack. So, this version would be equally good:

mov    edx,0x0 // this is x
mov    eax,edx // eax is the return
ret

Here, edx stores x. But now, why stop here? As edx and eax are both zero, we could just say:

xor    eax,eax // eax is the return, and x as well
ret

And we transformed fn() to the optimized version. Is this transformation valid? If not, which step is invalid?

7796 次浏览

No. Access to volatile objects is considered observable behavior, exactly as I/O, with no particular distinction between locals and globals.

The least requirements on a conforming implementation are:

  • Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

[...]

These collectively are referred to as the observable behavior of the program.

N3690, [intro.execution], ¶8

How exactly this is observable is outside the scope of the standard, and falls straightly into implementation-specific territory, exactly as I/O and access to global volatile objects. volatile means "you think you know everything going on here, but it's not like that; trust me and do this stuff without being too smart, because I'm in your program doing my secret stuff with your bytes". This is actually explained at [dcl.type.cv] ¶7:

[ Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. Furthermore, for some implementations, volatile might indicate that special hardware instructions are required to access the object. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C. — end note ]

I think I have never seen a local variable using volatile that wasn't a pointer to a volatile. As in:

int fn() {
volatile int *x = (volatile int *)0xDEADBEEF;
*x = 23;   // request data, 23 = temperature
return *x; // return temperature
}

The only other cases of volatile I know use a global that is written in a signal handler. No pointers involved there. Or access to symbols defined in a linker script to be at specific addresses relevant to the hardware.

It's much easier to reason there why the optimization would alter the observable effects. But the same rule applies for your local volatile variable. The compiler has to behave as if the access to x is observable and can't optimize it away.

Theoretically, an interrupt handler could

  • check if the return address falls within the fn() function. It might access the symbol table or source line numbers via instrumentation or attached debug information.
  • then change the value of x, which would be stored at a predictable offset from the stack pointer.

… thus making fn() return a nonzero value.

This loop can be optimised away by the as-if rule because it has no observable behaviour:

for (unsigned i = 0; i < n; ++i) { bool looped = true; }

This one cannot:

for (unsigned i = 0; i < n; ++i) { volatile bool looped = true; }

The second loop does something on every iteration, which means the loop takes O(n) time. I have no idea what the constant is, but I can measure it and then I have a way of busy looping for a (more or less) known amount of time.

I can do that because the standard says that access to volatiles must happen, in order. If a compiler were to decide that in this case the standard didn't apply, I think I would have the right to file a bug report.

If the compiler chooses to put looped into a register, I suppose I have no good argument against that. But it still must set the value of that register to 1 for every loop iteration.

I'm just going to add a detailed reference for the as-if rule and the volatile keyword. (At the bottom of these pages, follow the "see also" and "References" to trace back to the original specs, but I find cppreference.com much easier to read/understand.)

Particularly, I want you to read this section

volatile object - an object whose type is volatile-qualified, or a subobject of a volatile object, or a mutable subobject of a const-volatile object. Every access (read or write operation, member function call, etc.) made through a glvalue expression of volatile-qualified type is treated as a visible side-effect for the purposes of optimization (that is, within a single thread of execution, volatile accesses cannot be optimized out or reordered with another visible side effect that is sequenced-before or sequenced-after the volatile access. This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution, see std::memory_order). Any attempt to refer to a volatile object through a non-volatile glvalue (e.g. through a reference or pointer to non-volatile type) results in undefined behavior.

So the volatile keyword specifically is about disabling the compiler optimization on glvalues. The only thing here the volatile keyword can affect is possibly return x, the compiler can do whatever it wants with the rest of the function.

How much the compiler can optimize the return depends on how much the compiler is allowed to optimize the access of x in this case (since it isn't reordering anything, and strictly speaking, isn't removing the return expression. There is the access, but it is reading and writing to the stack, which is should be able to streamline.) So as I read it, this is a grey area in how much the compiler is allowed to optimize, and can easily be argued both ways.

Side note: In these cases, always assume the compiler will do the opposite of what you wanted/needed. You should either disable optimization (at least for this module), or try to find a more defined behavior for what you want. (This is also why unit testing is so important) If you believe it is a defect, you should bring it up with the developers of C++.


This all is still really hard to read, so trying to include what I think is relevant so that you can read it yourself.

glvalue A glvalue expression is either lvalue or xvalue.

Properties:

A glvalue may be implicitly converted to a prvalue with lvalue-to-rvalue, array-to-pointer, or function-to-pointer implicit conversion. A glvalue may be polymorphic: the dynamic type of the object it identifies is not necessarily the static type of the expression. A glvalue can have incomplete type, where permitted by the expression.


xvalue The following expressions are xvalue expressions:

a function call or an overloaded operator expression, whose return type is rvalue reference to object, such as std::move(x); a[n], the built-in subscript expression, where one operand is an array rvalue ; a.m, the member of object expression, where a is an rvalue and m is a non-static data member of non-reference type; a.*mp, the pointer to member of object expression, where a is an rvalue and mp is a pointer to data member; a ? b : c, the ternary conditional expression for some b and c (see definition for detail); a cast expression to rvalue reference to object type, such as static_cast(x); any expression that designates a temporary object, after temporary materialization. (since C++17) Properties:

Same as rvalue (below). Same as glvalue (below). In particular, like all rvalues, xvalues bind to rvalue references, and like all glvalues, xvalues may be polymorphic, and non-class xvalues may be cv-qualified.


lvalue The following expressions are lvalue expressions:

the name of a variable, a function, or a data member, regardless of type, such as std::cin or std::endl. Even if the variable's type is rvalue reference, the expression consisting of its name is an lvalue expression; a function call or an overloaded operator expression, whose return type is lvalue reference, such as std::getline(std::cin, str), std::cout << 1, str1 = str2, or ++it; a = b, a += b, a %= b, and all other built-in assignment and compound assignment expressions; ++a and --a, the built-in pre-increment and pre-decrement expressions; *p, the built-in indirection expression; a[n] and p[n], the built-in subscript expressions, except where a is an array rvalue (since C++11); a.m, the member of object expression, except where m is a member enumerator or a non-static member function, or where a is an rvalue and m is a non-static data member of non-reference type; p->m, the built-in member of pointer expression, except where m is a member enumerator or a non-static member function; a.*mp, the pointer to member of object expression, where a is an lvalue and mp is a pointer to data member; p->*mp, the built-in pointer to member of pointer expression, where mp is a pointer to data member; a, b, the built-in comma expression, where b is an lvalue; a ? b : c, the ternary conditional expression for some b and c (e.g., when both are lvalues of the same type, but see definition for detail); a string literal, such as "Hello, world!"; a cast expression to lvalue reference type, such as static_cast(x); a function call or an overloaded operator expression, whose return type is rvalue reference to function; a cast expression to rvalue reference to function type, such as static_cast(x). (since C++11) Properties:

Same as glvalue (below). Address of an lvalue may be taken: &++i1 and &std::endl are valid expressions. A modifiable lvalue may be used as the left-hand operand of the built-in assignment and compound assignment operators. An lvalue may be used to initialize an lvalue reference; this associates a new name with the object identified by the expression.


as-if rule

The C++ compiler is permitted to perform any changes to the program as long as the following remains true:

1) At every sequence point, the values of all volatile objects are stable (previous evaluations are complete, new evaluations not started) (until C++11) 1) Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread. (since C++11) 2) At program termination, data written to files is exactly as if the program was executed as written. 3) Prompting text which is sent to interactive devices will be shown before the program waits for input. 4) If the ISO C pragma #pragma STDC FENV_ACCESS is supported and is set to ON, the changes to the floating-point environment (floating-point exceptions and rounding modes) are guaranteed to be observed by the floating-point arithmetic operators and function calls as if executed as written, except that the result of any floating-point expression other than cast and assignment may have range and precision of a floating-point type different from the type of the expression (see FLT_EVAL_METHOD) notwithstanding the above, intermediate results of any floating-point expression may be calculated as if to infinite range and precision (unless #pragma STDC FP_CONTRACT is OFF)


If you want to read the specs, I believe these are the ones you need to read

References

C11 standard (ISO/IEC 9899:2011): 6.7.3 Type qualifiers (p: 121-123)

C99 standard (ISO/IEC 9899:1999): 6.7.3 Type qualifiers (p: 108-110)

C89/C90 standard (ISO/IEC 9899:1990): 3.5.3 Type qualifiers

I beg to dissent with the majority opinion, despite the full understanding that volatile means observable I/O.

If you have this code:

{
volatile int x;
x = 0;
}

I believe the compiler can optimize it away under the as-if rule, assuming that:

  1. The volatile variable is not otherwise made visible externally via e.g. pointers (which is obviously not a problem here since there is no such thing in the given scope)

  2. The compiler does not provide you with a mechanism for externally accessing that volatile

The rationale is simply that you couldn't observe the difference anyway, due to criterion #2.

However, in your compiler, criterion #2 may not be satisfied! The compiler may try to provide you with extra guarantees about observing volatile variables from the "outside", such as by analyzing the stack. In such situations, the behavior really is observable, so it cannot be optimized away.

Now the question is, is the following code any different than the above?

{
volatile int x = 0;
}

I believe I've observed different behavior for this in Visual C++ with respect to optimization, but I'm not entirely sure on what grounds. It may be that initialization does not count as "access"? I'm not sure. This may be worth a separate question if you're interested, but otherwise I believe the answer is as I explained above.