自从 C + + 17以来,具有正确地址和类型的指针仍然是有效的指针吗?

(关于 这个问题和答案。)

在 C + + 17标准之前,[基础化合物]/3中包含以下句子:

如果一个 T 类型的对象位于地址 A,那么一个值为地址 A 的 cv T * 类型的指针就会指向该对象,而不管该值是如何获得的。

但是从 C + + 17开始,这个句子就变成了 被移除了

例如,我相信这个句子定义了这个示例代码,而且因为 C + + 17,这个未定义行为是:

 alignas(int) unsigned char buffer[2*sizeof(int)];
auto p1=new(buffer) int{};
auto p2=new(p1+1) int{};
*(p1+1)=10;

在 C + + 17之前,p1+1保存到 *p2的地址并具有正确的类型,因此 *(p1+1)是指向 *p2的指针。在 C + + 17中,p1+1是一个 指针 past-the-end ,所以它不是一个 指向对象的指针,我相信它是不可引用的。

这是对标准权利的这种修改的解释,还是有其他规则对删除所引句子作出补偿?

4595 次浏览

Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?

Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.

The new [basic.compound]/3 says:

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or

Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.

We also have the non-normative note:

[ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]

which clarifies the intent.


As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).

In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.

Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:

3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.

The 3.3 seems rather unclear, but the examples below make the intent more clear:

struct A { unsigned char a[32]; };
struct B { unsigned char b[16]; };
A a;
B *b = new (a.a + 8) B; // a.a provides storage for *b
int *p = new (b->b + 4) int; // b->b provides storage for *p
// a.a does not provide storage for *p (directly),
// but *p is nested within a (see below)

So in the example, the buffer array provides storage for both *p1 and *p2.

The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:

4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.

5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.

Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):

3 ... Every value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.

A value of a pointer type that is a pointer to or past the end of an object represents the address of the first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to point to an unrelated object of the object’s type that might be located at that address. A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ] For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall have the same value representation and alignment requirements (6.11)...

The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.

So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.


I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.

In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:

class T {
...
public T(U initialization) {
...
}
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
U u(...);
new(arr + i) T(u);
}

arr can then be used as a pointer to the first element of an array...

To expand on the answers given here is an example of what I believe the revised wording is excluding:

Warning: Undefined Behaviour

#include <iostream>
int main() {
int A[1]{7};
int B[1]{10};
bool same{(B)==(A+1)};


std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
std::cout<<(same?"same":"not same")<<'\n';
std::cout<<*(A+1)<<'\n';//!!!!!
return 0;
}

For entirely implementation dependent (and fragile) reasons possible output of this program is:

0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10

That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.

The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'. That can't be good and surely never the intention.