C 语言中数组初始化的困惑

在 C 语言中,如果像这样初始化一个数组:

int a[5] = {1,2};

然后,数组中所有未显式初始化的元素都将以零隐式初始化。

但是,如果我像这样初始化一个数组:

int a[5]={a[2]=1};


printf("%d %d %d %d %d\n", a[0], a[1],a[2], a[3], a[4]);

产出:

1 0 1 0 0

我不明白,为什么 a[0]打印的是 1而不是 0? 这是不明确的行为吗?

这个问题是在一次采访中提出的。

6276 次浏览

I don't understand, why does a[0] print 1 instead of 0?

Presumably a[2]=1 initializes a[2] first, and the result of the expression is used to initialize a[0].

From N2176 (C17 draft):

6.7.9 Initialization

  1. The evaluations of the initialization list expressions are indeterminately sequenced with respect to one another and thus the order in which any side effects occur is unspecified. 154)

So it would seem that output 1 0 0 0 0 would also have been possible.

Conclusion: Don't write initializers that modifies the initialized variable on the fly.

TL;DR: I don't think the behavior of int a[5]={a[2]=1}; is well defined, at least in C99.

The funny part is that the only bit that makes sense to me is the part you're asking about: a[0] is set to 1 because the assignment operator returns the value that was assigned. It's everything else that's unclear.

If the code had been int a[5] = { [2] = 1 }, everything would've been easy: That's a designated initializer setting a[2] to 1 and everything else to 0. But with { a[2] = 1 } we have a non-designated initializer containing an assignment expression, and we fall down a rabbit hole.


Here's what I've found so far:

  • a must be a local variable.

    6.7.8 Initialization

    1. All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.

    a[2] = 1 is not a constant expression, so a must have automatic storage.

  • a is in scope in its own initialization.

    6.2.1 Scopes of identifiers

    1. Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. Each enumeration constant has scope that begins just after the appearance of its defining enumerator in an enumerator list. Any other identifier has scope that begins just after the completion of its declarator.

    The declarator is a[5], so variables are in scope in their own initialization.

  • a is alive in its own initialization.

    6.2.4 Storage durations of objects

    1. An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic storage duration.

    2. For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.

  • There is a sequence point after a[2]=1.

    6.8 Statements and blocks

    1. A full expression is an expression that is not part of another expression or of a declarator. Each of the following is a full expression: an initializer; the expression in an expression statement; the controlling expression of a selection statement (if or switch); the controlling expression of a while or do statement; each of the (optional) expressions of a for statement; the (optional) expression in a return statement. The end of a full expression is a sequence point.

    Note that e.g. in int foo[] = { 1, 2, 3 } the { 1, 2, 3 } part is a brace-enclosed list of initializers, each of which has a sequence point after it.

  • Initialization is performed in initializer list order.

    6.7.8 Initialization

    1. Each brace-enclosed initializer list has an associated current object. When no designations are present, subobjects of the current object are initialized in order according to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union. [...]

     

    1. The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
  • However, initializer expressions are not necessarily evaluated in order.

    6.7.8 Initialization

    1. The order in which any side effects occur among the initialization list expressions is unspecified.

However, that still leaves some questions unanswered:

  • Are sequence points even relevant? The basic rule is:

    6.5 Expressions

    1. Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

    a[2] = 1 is an expression, but initialization is not.

    This is slightly contradicted by Annex J:

    J.2 Undefined behavior

    • Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored (6.5).

    Annex J says any modification counts, not just modifications by expressions. But given that annexes are non-normative, we can probably ignore that.

  • How are the subobject initializations sequenced with respect to initializer expressions? Are all initializers evaluated first (in some order), then the subobjects are initialized with the results (in initializer list order)? Or can they be interleaved?


I think int a[5] = { a[2] = 1 } is executed as follows:

  1. Storage for a is allocated when its containing block is entered. The contents are indeterminate at this point.
  2. The (only) initializer is executed (a[2] = 1), followed by a sequence point. This stores 1 in a[2] and returns 1.
  3. That 1 is used to initialize a[0] (the first initializer initializes the first subobject).

But here things get fuzzy because the remaining elements (a[1], a[2], a[3], a[4]) are supposed to be initialized to 0, but it's not clear when: Does it happen before a[2] = 1 is evaluated? If so, a[2] = 1 would "win" and overwrite a[2], but would that assignment have undefined behavior because there is no sequence point between the zero initialization and the assignment expression? Are sequence points even relevant (see above)? Or does zero initialization happen after all initializers are evaluated? If so, a[2] should end up being 0.

Because the C standard does not clearly define what happens here, I believe the behavior is undefined (by omission).

My Understanding is a[2]=1 returns value 1 so code becomes

int a[5]={a[2]=1} --> int a[5]={1}

int a[5]={1} assign value for a[0]=1

Hence it print 1 for a[0]

For example

char str[10]={‘H’,‘a’,‘i’};




char str[0] = ‘H’;
char str[1] = ‘a’;
char str[2] = ‘i;

I try to give a short and simple answer for the puzzle: int a[5] = { a[2] = 1 };

  1. First a[2] = 1 is set. That means the array says: 0 0 1 0 0
  2. But behold, given that you did it in the { } brackets, which are used to initialize the array in order, it takes the first value (which is 1) and sets that to a[0]. It is as if int a[5] = { a[2] }; would remain, where we already got a[2] = 1. The resulting array is now: 1 0 1 0 0

Another example: int a[6] = { a[3] = 1, a[4] = 2, a[5] = 3 }; - Even though the order is somewhat arbitrary, assuming it goes from left to right, it would go in these 6 steps:

0 0 0 1 0 0
1 0 0 1 0 0
1 0 0 1 2 0
1 2 0 1 2 0
1 2 0 1 2 3
1 2 3 1 2 3

I think the C11 standard covers this behaviour and says that the result is unspecified, and I don't think C18 made any relevant changes in this area.

The standard language is not easy to parse. The relevant section of the standard is §6.7.9 Initialization. The syntax is documented as:

initializer:
                assignment-expression
                { initializer-list }
                { initializer-list , }
initializer-list:
                designationopt initializer
                initializer-list , designationopt initializer
designation:
                designator-list =
designator-list:
                designator
                designator-list designator
designator:
                [ constant-expression ]
                . identifier

Note that one of the terms is assignment-expression, and since a[2] = 1 is indubitably an assignment expression, it is allowed inside initializers for arrays with non-static duration:

§4 All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.

One of the key paragraphs is:

§19 The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;151) all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

151) Any initializer for the subobject which is overridden and so not used to initialize that subobject might not be evaluated at all.

And another key paragraph is:

§23 The evaluations of the initialization list expressions are indeterminately sequenced with respect to one another and thus the order in which any side effects occur is unspecified.152)

152) In particular, the evaluation order need not be the same as the order of subobject initialization.

I'm fairly sure that paragraph §23 indicates that the notation in the question:

int a[5] = { a[2] = 1 };

leads to unspecified behaviour. The assignment to a[2] is a side-effect, and the evaluation order of the expressions are indeterminately sequenced with respect to one another. Consequently, I don't think there is a way to appeal to the standard and claim that a particular compiler is handling this correctly or incorrectly.

The assignment a[2]= 1 is an expression that has the value 1, and you essentially wrote int a[5]= { 1 }; (with the side effect that a[2] is assigned 1 as well).

I believe, that int a[5]={ a[2]=1 }; is a good example for a programmer shooting him/herself into his/her own foot.

I might be tempted to think that what you meant was int a[5]={ [2]=1 }; which would be a C99 designated initializer setting element 2 to 1 and the rest to zero.

In the rare case that you really really meant int a[5]={ 1 }; a[2]=1;, then that would be a funny way of writing it. Anyhow, this is what your code boils down to, even though some here pointed out that it's not well defined when the write to a[2] is actually executed. The pitfall here is that a[2]=1 is not a designated initializer but a simple assignment which itself has the value 1.