这个来自“ C++程式语言”第四版第36.3.6节的代码是否有明确定义的行为?

在比雅尼·斯特劳斯特鲁普的 译自: http://www.wikipedia.org/wiki/(http://en.wikipedia.org/wiki/The _ c% 2B% 2B _ Programming _ Language)第4版第2节中,使用了以下代码作为 锁链的例子:

void f2()
{
std::string s = "but I have heard it works even if you don't believe in it" ;
s.replace(0, 4, "" ).replace( s.find( "even" ), 4, "only" )
.replace( s.find( " don't" ), 6, "" );


assert( s == "I have heard it works only if you believe in it" ) ;
}

断言在 gcc(原文地址: http://coliru.stack-crooked.com/a/d3d81ff98cab2f5c)和 Visual Studio(译自: 美国《科学》杂志网站(http://rextester.com/IFNX5880))中失败,但是在使用 (译自: 美国《科学》杂志网站(http://coliru.stack-crooked.com/a/e6facaa9f18e252f)原著: http://www.coliru.com/)时不会失败。

为什么结果不一样?这些编译器是否错误地计算了链表达式,或者这段代码表现出某种形式的 没有具体说明未定义行为

11291 次浏览

The code exhibits unspecified behavior due to unspecified order of evaluation of sub-expressions although it does not invoke undefined behavior since all side effects are done within functions which introduces a sequencing relationship between the side effects in this case.

This example is mentioned in the proposal N4228: Refining Expression Evaluation Order for Idiomatic C++ which says the following about the code in the question:

[...]This code has been reviewed by C++ experts world-wide, and published (The C++ Programming Language, 4th edition.) Yet, its vulnerability to unspecified order of evaluation has been discovered only recently by a tool[...]

Details

It may be obvious to many that arguments to functions have an unspecified order of evaluation but it is probably not as obvious how this behavior interacts with chained functions calls. It was not obvious to me when I first analyzed this case and apparently not to all the expert reviewers either.

At first glance it may appear that since each replace has to be evaluated from left to right that the corresponding function argument groups must be evaluated as groups from left to right as well.

This is incorrect, function arguments have an unspecified order of evaluation, although chaining function calls does introduce a left to right evaluation order for each function call, the arguments of each function call are only sequenced before with respect to the member function call they are part of. In particular this impacts the following calls:

s.find( "even" )

and:

s.find( " don't" )

which are indeterminately sequenced with respect to:

s.replace(0, 4, "" )

the two find calls could be evaluated before or after the replace, which matters since it has a side effect on s in a way that would alter the result of find, it changes the length of s. So depending on when that replace is evaluated relative to the two find calls the result will differ.

If we look at the chaining expression and examine the evaluation order of some of the sub-expressions:

s.replace(0, 4, "" ).replace( s.find( "even" ), 4, "only" )
^ ^       ^  ^  ^    ^        ^                 ^  ^
A B       |  |  |    C        |                 |  |
1  2  3             4                 5  6

and:

.replace( s.find( " don't" ), 6, "" );
^        ^                   ^  ^
D        |                   |  |
7                   8  9

Note, we are ignoring the fact that 4 and 7 can be further broken down into more sub-expressions. So:

  • A is sequenced before B which is sequenced before C which is sequenced before D
  • 1 to 9 are indeterminately sequenced with respect to other sub-expressions with some of the exceptions listed below
    • 1 to 3 are sequenced before B
    • 4 to 6 are sequenced before C
    • 7 to 9 are sequenced before D

The key to this issue is that:

  • 4 to 9 are indeterminately sequenced with respect to B

The potential order of evaluation choice for 4 and 7 with respect to B explains the difference in results between clang and gcc when evaluating f2(). In my tests clang evaluates B before evaluating 4 and 7 while gcc evaluates it after. We can use the following test program to demonstrate what is happening in each case:

#include <iostream>
#include <string>


std::string::size_type my_find( std::string s, const char *cs )
{
std::string::size_type pos = s.find( cs ) ;
std::cout << "position " << cs << " found in complete expression: "
<< pos << std::endl ;


return pos ;
}


int main()
{
std::string s = "but I have heard it works even if you don't believe in it" ;
std::string copy_s = s ;


std::cout << "position of even before s.replace(0, 4, \"\" ): "
<< s.find( "even" ) << std::endl ;
std::cout << "position of  don't before s.replace(0, 4, \"\" ): "
<< s.find( " don't" ) << std::endl << std::endl;


copy_s.replace(0, 4, "" ) ;


std::cout << "position of even after s.replace(0, 4, \"\" ): "
<< copy_s.find( "even" ) << std::endl ;
std::cout << "position of  don't after s.replace(0, 4, \"\" ): "
<< copy_s.find( " don't" ) << std::endl << std::endl;


s.replace(0, 4, "" ).replace( my_find( s, "even" ) , 4, "only" )
.replace( my_find( s, " don't" ), 6, "" );


std::cout << "Result: " << s << std::endl ;
}

Result for gcc (see it live)

position of even before s.replace(0, 4, "" ): 26
position of  don't before s.replace(0, 4, "" ): 37


position of even after s.replace(0, 4, "" ): 22
position of  don't after s.replace(0, 4, "" ): 33


position  don't found in complete expression: 37
position even found in complete expression: 26


Result: I have heard it works evenonlyyou donieve in it

Result for clang (see it live):

position of even before s.replace(0, 4, "" ): 26
position of  don't before s.replace(0, 4, "" ): 37


position of even after s.replace(0, 4, "" ): 22
position of  don't after s.replace(0, 4, "" ): 33


position even found in complete expression: 22
position don't found in complete expression: 33


Result: I have heard it works only if you believe in it

Result for Visual Studio (see it live):

position of even before s.replace(0, 4, "" ): 26
position of  don't before s.replace(0, 4, "" ): 37


position of even after s.replace(0, 4, "" ): 22
position of  don't after s.replace(0, 4, "" ): 33


position  don't found in complete expression: 37
position even found in complete expression: 26
Result: I have heard it works evenonlyyou donieve in it

Details from the standard

We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:

Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.[...]

and we know that a function call introduces a sequenced before relationship of the function calls postfix expression and arguments with respect to the function body, from section 1.9:

[...]When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.[...]

We also know that class member access and therefore chaining will evaluate from left to right, from section 5.2.5 Class member access which says:

[...]The postfix expression before the dot or arrow is evaluated;64 the result of that evaluation, together with the id-expression, determines the result of the entire postfix expression.

Note, in the case where the id-expression ends up being a non-static member function it does not specify the order of evaluation of the expression-list within the () since that is a separate sub-expression. The relevant grammar from 5.2 Postfix expressions:

postfix-expression:
postfix-expression ( expression-listopt)       // function call
postfix-expression . templateopt id-expression // Class member access, ends
// up as a postfix-expression

C++17 changes

The proposal p0145r3: Refining Expression Evaluation Order for Idiomatic C++ made several changes. Including changes that give the code well specified behavior by strengthening the order of evaluation rules for postfix-expressions and their expression-list.

[expr.call]p5 says:

The postfix-expression is sequenced before each expression in the expression-list and any default argument. The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter. [ Note: All side effects of argument evaluations are sequenced before the function is entered (see 4.6). —end note ] [ Example:

void f() {
std::string s = "but I have heard it works even if you don’t believe in it";
s.replace(0, 4, "").replace(s.find("even"), 4, "only").replace(s.find(" don’t"), 6, "");
assert(s == "I have heard it works only if you believe in it"); // OK
}

—end example ]

This is intended to add information on the matter with regards to C++17. The proposal (Refining Expression Evaluation Order for Idiomatic C++ Revision 2) for C++17 addressed the issue citing the code above was as specimen.

As suggested, I added relevant information from the proposal and to quote (highlights mine):

The order of expression evaluation, as it is currently specified in the standard, undermines advices, popular programming idioms, or the relative safety of standard library facilities. The traps aren't just for novices or the careless programmer. They affect all of us indiscriminately, even when we know the rules.

Consider the following program fragment:

void f()
{
std::string s = "but I have heard it works even if you don't believe in it"
s.replace(0, 4, "").replace(s.find("even"), 4, "only")
.replace(s.find(" don't"), 6, "");
assert(s == "I have heard it works only if you believe in it");
}

The assertion is supposed to validate the programmer's intended result. It uses "chaining" of member function calls, a common standard practice. This code has been reviewed by C++ experts world-wide, and published (The C++ Programming Language, 4th edition.) Yet, its vulnerability to unspecified order of evaluation has been discovered only recently by a tool.

The paper suggested changing the pre-C++17 rule on the order of expression evaluation which was influenced by C and have existed for more than three decades. It proposed that the language should guarantee contemporary idioms or risk "traps and sources of obscure, hard to find bugs" such as what happened with the code specimen above.

The proposal for C++17 is to require that every expression has a well-defined evaluation order:

  • Postfix expressions are evaluated from left to right. This includes functions calls and member selection expressions.
  • Assignment expressions are evaluated from right to left. This includes compound assignments.
  • Operands to shift operators are evaluated from left to right.
  • The order of evaluation of an expression involving an overloaded operator is determined by the order associated with the corresponding built-in operator, not the rules for function calls.

The above code compiles successfully using GCC 7.1.1 and Clang 4.0.0.