(为什么)使用未初始化的变量未定义行为?

如果有:

unsigned int x;
x -= x;

很明显,在这个表达式之后,x 应该是零,但是无论我看哪里,他们都说这段代码的 行为是未定义的,而不仅仅是 x的值(直到减法之前)。

两个问题:

  • 这段代码的 行为确实是未定义的吗?
    (例如,代码可能在一个顺从的系统上崩溃[或更糟糕] ?)

  • 如果是这样,那么当完全清楚 x在这里应该为零时,C 是否说 行为是未定义的?

    也就是说,如果没有在这里定义行为,那么 优势是什么呢?

显然,编译器可以简单地在变量内部使用它认为“方便”的 随便啦垃圾值,并且可以按预期的方式工作... ... 这种方法有什么问题吗?

15880 次浏览

Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.

(6.2.6.1 on a late C11 draft says) Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)

Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?

Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.

The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.

Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.

Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment

unsigned x, y, z;   /* 0 */
y = 0;              /* 1 */
z = 4;              /* 2 */
x = - x;            /* 3 */
y = y + z;          /* 4 */
x = y + 1;          /* 5 */

When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:

r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;

The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.

For a more elaborate example, consider the following code fragment:

unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}

Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written

unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}

The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:

  • during the first loop iteration, x is uninitialized by the time -x is evaluated.
  • -x has undefined behavior, so its value is whatever-is-convenient.
  • The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.

When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.

I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)

This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.

(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)

The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)

3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.

So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.

Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).

If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.

Yes this behavior is undefined but for different reasons than most people are aware of.

First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.

What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.

Edit: The relevant phrase of the standard is 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

And to make it clearer, the following code is legal under all circumstances:

unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
  • Here the addresses of a and b are taken, so their value is just indeterminate.
  • Since unsigned char never has trap representations that indeterminate value is just unspecified, any value of unsigned char could happen.
  • At the end a must hold the value 0.

Edit2: a and b have unspecified values:

3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:

volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}

a compiler for a platform like the ARM where all instructions other than loads and stores operate on 32-bit registers might reasonably process the code in a fashion equivalent to:

volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}

If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:

  • In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
  • Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
  • Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].

    The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.


[1]: C11 6.3.2.1:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

[2]: C11 6.2.6.1:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

[3] C11:

3.19.2
indeterminate value
either an unspecified value or a trap representation

3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.

3.19.4
trap representation
an object representation that need not represent a value of the object type