这是 C + + 11 for 循环的一个已知陷阱吗?

让我们假设我们有一个结构来保持3个双精度函数和一些成员函数:

struct Vector {
double x, y, z;
// ...
Vector &negate() {
x = -x; y = -y; z = -z;
return *this;
}
Vector &normalize() {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
// ...
};

为了简单起见,这有点做作,但我相信您会同意存在类似的代码。这些方法允许您方便地链接,例如:

Vector v = ...;
v.normalize().negate();

甚至:

Vector v = Vector{1., 2., 3.}.normalize().negate();

现在,如果我们提供了 start ()和 end ()函数,我们可以在一个新样式的 for 循环中使用 Vector,比如在3个坐标 x,y 和 z 上循环(你无疑可以通过用例如 String 替换 Vector 来构造更多“有用”的例子) :

Vector v = ...;
for (double x : v) { ... }

我们甚至可以做:

Vector v = ...;
for (double x : v.normalize().negate()) { ... }

还有:

for (double x : Vector{1., 2., 3.}) { ... }

然而,以下内容(在我看来)是不完整的:

for (double x : Vector{1., 2., 3.}.normalize()) { ... }

虽然它似乎是前两种用法的逻辑组合,但我认为最后一种用法创建了一个悬空引用,而前两种用法完全没有问题。

  • 这是正确的和广泛的赞赏?
  • 以上哪一部分是应该避免的“坏”部分?
  • 通过更改基于范围的 for 循环的定义,使得在 for 表达式中构造的临时变量在循环期间存在,是否可以改进该语言?
4074 次浏览

for (double x : Vector{1., 2., 3.}.normalize()) { ... }

That is not a limitation of the language, but a problem with your code. The expression Vector{1., 2., 3.} creates a temporary, but the normalize function returns an lvalue-reference. Because the expression is an lvalue, the compiler assumes that the object will be alive, but because it is a reference to a temporary, the object dies after the full expression is evaluated, so you are left with a dangling reference.

Now, if you change your design to return a new object by value rather than a reference to the current object, then there would be no issue and the code would work as expected.

Is this correct and Widely appreciated?

Yes, your understanding of things is correct.

Which part of the above is the "bad" part, that should be avoided?

The bad part is taking an l-value reference to a temporary returned from a function, and binding it to an r-value reference. It is just as bad as this:

auto &&t = Vector{1., 2., 3.}.normalize();

The temporary Vector{1., 2., 3.}'s lifetime cannot be extended because the compiler has no idea that the return value from normalize references it.

Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?

That would be highly inconsistent with how C++ works.

Would it prevent certain gotchas made by people using chained expressions on temporaries or various lazy-evaluation methods for expressions? Yes. But it would also be require special-case compiler code, as well as be confusing as to why it doesn't work with other expression constructs.

A much more reasonable solution would be some way to inform the compiler that the return value of a function is always a reference to this, and therefore if the return value is bound to a temporary-extending construct, then it would extend the correct temporary. That's a language-level solution though.

Presently (if the compiler supports it), you can make it so that normalize cannot be called on a temporary:

struct Vector {
double x, y, z;
// ...
Vector &normalize() & {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
Vector &normalize() && = delete;
};

This will cause Vector{1., 2., 3.}.normalize() to give a compile error, while v.normalize() will work fine. Obviously you won't be able to do correct things like this:

Vector t = Vector{1., 2., 3.}.normalize();

But you also won't be able to do incorrect things.

Alternatively, as suggested in the comments, you can make the rvalue reference version return a value rather than a reference:

struct Vector {
double x, y, z;
// ...
Vector &normalize() & {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
Vector normalize() && {
Vector ret = *this;
ret.normalize();
return ret;
}
};

If Vector was a type with actual resources to move, you could use Vector ret = std::move(*this); instead. Named return value optimization makes this reasonably optimal in terms of performance.

IMHO, the second example is already flawed. That the modifying operators return *this is convenient in the way you mentioned: it allows chaining of modifiers. It can be used for simply handing on the result of the modification, but doing this is error-prone because it can easily be overlooked. If I see something like

Vector v{1., 2., 3.};
auto foo = somefunction1(v, 17);
auto bar = somefunction2(true, v, 2, foo);
auto baz = somefunction3(bar.quun(v), 93.2, v.qwarv(foo));

I wouldn't automatically suspect that the functions modify v as a side-effect. Of course, they could, but it would be confusing. So if I was to write something like this, I would make sure that v stays constant. For your example, I would add free functions

auto normalized(Vector v) -> Vector {return v.normalize();}
auto negated(Vector v) -> Vector {return v.negate();}

and then write the loops

for( double x : negated(normalized(v)) ) { ... }

and

for( double x : normalized(Vector{1., 2., 3}) ) { ... }

That is IMO better readable, and it's safer. Of course, it requires an extra copy, however for heap-allocated data this could likely be done in a cheap C++11 move operation.