Why would the behavior of std::memcpy be undefined for objects that are not TriviallyCopyable?

From http://en.cppreference.com/w/cpp/string/byte/memcpy:

If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined.

At my work, we have used std::memcpy for a long time to bitwise swap objects that are not TriviallyCopyable using:

void swapMemory(Entity* ePtr1, Entity* ePtr2)
{
static const int size = sizeof(Entity);
char swapBuffer[size];


memcpy(swapBuffer, ePtr1, size);
memcpy(ePtr1, ePtr2, size);
memcpy(ePtr2, swapBuffer, size);
}

and never had any issues.

I understand that it is trivial to abuse std::memcpy with non-TriviallyCopyable objects and cause undefined behavior downstream. However, my question:

Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects? Why does the standard deem it necessary to specify that?

UPDATE

The contents of http://en.cppreference.com/w/cpp/string/byte/memcpy have been modified in response to this post and the answers to the post. The current description says:

If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined unless the program does not depend on the effects of the destructor of the target object (which is not run by memcpy) and the lifetime of the target object (which is ended, but not started by memcpy) is started by some other means, such as placement-new.

PS

Comment by @Cubbi:

@RSahu if something guarantees UB downstream, it renders the entire program undefined. But I agree that it appears to be possible to skirt around UB in this case and modified cppreference accordingly.

10208 次浏览

Because the standard says so.

Compilers may assume that non-TriviallyCopyable types are only copied via their copy/move constructors/assignment operators. This could be for optimization purposes (if some data is private, it could defer setting it until a copy / move occurs).

The compiler is even free to take your memcpy call and have it do nothing, or format your hard drive. Why? Because the standard says so. And doing nothing is definitely faster than moving bits around, so why not optimize your memcpy to an equally-valid faster program?

Now, in practice, there are many problems that can occur when you just blit around bits in types that don't expect it. Virtual function tables might not be set up right. Instrumentation used to detect leaks may not be set up right. Objects whose identity includes their location get completely messed up by your code.

The really funny part is that using std::swap; swap(*ePtr1, *ePtr2); should be able to be compiled down to a memcpy for trivially copyable types by the compiler, and for other types be defined behavior. If the compiler can prove that copy is just bits being copied, it is free to change it to memcpy. And if you can write a more optimal swap, you can do so in the namespace of the object in question.

Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects?

It's not! However, once you copy the underlying bytes of one object of a non-trivially copyable type into another object of that type, the target object is not alive. We destroyed it by reusing its storage, and haven't revitalized it by a constructor call.

Using the target object - calling its member functions, accessing its data members - is clearly undefined[basic.life]/6, and so is a subsequent, implicit destructor call[basic.life]/4 for target objects having automatic storage duration. Note how undefined behavior is retrospective. [intro.execution]/5:

However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

If an implementation spots how an object is dead and necessarily subject to further operations that are undefined, ... it may react by altering your programs semantics. From the memcpy call onward. And this consideration gets very practical once we think of optimizers and certain assumptions that they make.

It should be noted that standard libraries are able and allowed to optimize certain standard library algorithms for trivially copyable types, though. ABC0 on pointers to trivially copyable types usually calls ABC1 on the underlying bytes. So does swap.
So simply stick to using normal generic algorithms and let the compiler do any appropriate low-level optimizations - this is partly what the idea of a trivially copyable type was invented for in the first place: Determining the legality of certain optimizations. Also, this avoids hurting your brain by having to worry about contradictory and underspecified parts of the language.

It is easy enough to construct a class where that memcpy-based swap breaks:

struct X {
int x;
int* px; // invariant: always points to x
X() : x(), px(&x) {}
X(X const& b) : x(b.x), px(&x) {}
X& operator=(X const& b) { x = b.x; return *this; }
};

memcpying such object breaks that invariant.

GNU C++11 std::string does exactly that with short strings.

This is similar to how the standard file and string streams are implemented. The streams eventually derive from std::basic_ios which contains a pointer to std::basic_streambuf. The streams also contain the specific buffer as a member (or base class sub-object), to which that pointer in std::basic_ios points to.

Many of these answers mention that memcpy could break invariants in the class, which would cause undefined behaviour later (and which in most cases should be reason enough not to risk it), but that doesn't seem to be what you're really asking.

One reason for why the memcpy call itself is deemed to be undefined behaviour is to give as much room as possible to the compiler to make optimizations based on the target platform. By having the call itself be UB, the compiler is allowed to do weird, platform-dependent things.

Consider this (very contrived and hypothetical) example: For a particular hardware platform, there might be several different kinds of memory, with some being faster than others for different operations. There might, for instance, be a kind of special memory that allows extra fast memory copies. A compiler for this (imaginary) platform is therefore allowed to place all TriviallyCopyable types in this special memory, and implement memcpy to use special hardware instructions that only work on this memory.

If you were to use memcpy on non-TriviallyCopyable objects on this platform, there might be some low-level INVALID OPCODE crash in the memcpy call itself.

Not the most convincing of arguments, perhaps, but the point is that the standard doesn't forbid it, which is only possible through making the memcpy call UB.

Another reason that memcpy is UB (apart from what has been mentioned in the other answers - it might break invariants later on) is that it is very hard for the standard to say exactly what would happen.

For non-trivial types, the standard says very little about how the object is laid out in memory, in which order the members are placed, where the vtable pointer is, what the padding should be, etc. The compiler has huge amounts of freedom in deciding this.

As a result, even if the standard wanted to allow memcpy in these "safe" situations, it would be impossible to state what situations are safe and which aren't, or when exactly the real UB would be triggered for unsafe cases.

I suppose that you could argue that the effects should be implementation-defined or unspecified, but I'd personally feel that would be both digging a bit too deep into platform specifics and giving a little bit too much legitimacy to something that in the general case is rather unsafe.

memcpy will copy all the bytes, or in your case swap all the bytes, just fine. An overzealous compiler could take the "undefined behaviour" as an excuse to to all kinds of mischief, but most compilers won't do that. Still, it is possible.

However, after these bytes are copied, the object that you copied them to may not be a valid object anymore. Simple case is a string implementation where large strings allocate memory, but small strings just use a part of the string object to hold characters, and keep a pointer to that. The pointer will obviously point to the other object, so things will be wrong. Another example I have seen was a class with data that was used in very few instances only, so that data was kept in a database with the address of the object as a key.

Now if your instances contain a mutex for example, I would think that moving that around could be a major problem.

C++ does not guarantee for all types that their objects occupy contiguous bytes of storage [intro.object]/5

An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

And indeed, through virtual base classes, you can create non-contiguous objects in major implementations. I have tried to build an example where a base class subobject of an object x is located before x's starting address. To visualize this, consider the following graph/table, where the horizontal axis is address space, and the vertical axis is the level of inheritance (level 1 inherits from level 0). Fields marked by dm are occupied by direct data members of the class.

L | 00 08 16
--+---------
1 |    dm
0 | dm

This is a usual memory layout when using inheritance. However, the location of a virtual base class subobject is not fixed, since it can be relocated by child classes that also inherit from the same base class virtually. This can lead to the situation that the level 1 (base class sub)object reports that it begins at address 8 and is 16 bytes large. If we naively add those two numbers, we'd think it occupies the address space [8, 24) even though it actually occupies [0, 16).

If we can create such a level 1 object, then we cannot use memcpy to copy it: memcpy would access memory that does not belong to this object (addresses 16 to 24). In my demo, is caught as a stack-buffer-overflow by clang++'s address sanitizer.

How to construct such an object? By using multiple virtual inheritance, I came up with an object that has the following memory layout (virtual table pointers are marked as vp). It is composed through four layers of inheritance:

L  00 08 16 24 32 40 48
3        dm
2  vp dm
1              vp dm
0           dm

The issue described above will arise for the level 1 base class subobject. Its starting address is 32, and it is 24 bytes large (vptr, its own data members and level 0's data members).

Here's the code for such a memory layout under clang++ and g++ @ coliru:

struct l0 {
std::int64_t dummy;
};


struct l1 : virtual l0 {
std::int64_t dummy;
};


struct l2 : virtual l0, virtual l1 {
std::int64_t dummy;
};


struct l3 : l2, virtual l1 {
std::int64_t dummy;
};

We can produce a stack-buffer-overflow as follows:

l3  o;
l1& so = o;


l1 t;
std::memcpy(&t, &so, sizeof(t));

Here's a complete demo that also prints some info about the memory layout:

#include <cstdint>
#include <cstring>
#include <iomanip>
#include <iostream>


#define PRINT_LOCATION() \
std::cout << std::setw(22) << __PRETTY_FUNCTION__                   \
<< " at offset " << std::setw(2)                                  \
<< (reinterpret_cast<char const*>(this) - addr)                 \
<< " ; data is at offset " << std::setw(2)                        \
<< (reinterpret_cast<char const*>(&dummy) - addr)               \
<< " ; naively to offset "                                        \
<< (reinterpret_cast<char const*>(this) - addr + sizeof(*this)) \
<< "\n"


struct l0 {
std::int64_t dummy;


void report(char const* addr) { PRINT_LOCATION(); }
};


struct l1 : virtual l0 {
std::int64_t dummy;


void report(char const* addr) { PRINT_LOCATION(); l0::report(addr); }
};


struct l2 : virtual l0, virtual l1 {
std::int64_t dummy;


void report(char const* addr) { PRINT_LOCATION(); l1::report(addr); }
};


struct l3 : l2, virtual l1 {
std::int64_t dummy;


void report(char const* addr) { PRINT_LOCATION(); l2::report(addr); }
};


void print_range(void const* b, std::size_t sz)
{
std::cout << "[" << (void const*)b << ", "
<< (void*)(reinterpret_cast<char const*>(b) + sz) << ")";
}


void my_memcpy(void* dst, void const* src, std::size_t sz)
{
std::cout << "copying from ";
print_range(src, sz);
std::cout << " to ";
print_range(dst, sz);
std::cout << "\n";
}


int main()
{
l3 o{};
o.report(reinterpret_cast<char const*>(&o));


std::cout << "the complete object occupies ";
print_range(&o, sizeof(o));
std::cout << "\n";


l1& so = o;
l1 t;
my_memcpy(&t, &so, sizeof(t));
}

Live demo

Sample output (abbreviated to avoid vertical scrolling):

l3::report at offset  0 ; data is at offset 16 ; naively to offset 48
l2::report at offset  0 ; data is at offset  8 ; naively to offset 40
l1::report at offset 32 ; data is at offset 40 ; naively to offset 56
l0::report at offset 24 ; data is at offset 24 ; naively to offset 32
the complete object occupies [0x9f0, 0xa20)
copying from [0xa10, 0xa28) to [0xa20, 0xa38)

Note the two emphasized end offsets.

What I can perceive here is that -- for some practical applications -- the C++ Standard may be to restrictive, or rather, not permittive enough.

As shown in other answers memcpy breaks down quickly for "complicated" types, but IMHO, it actually should work for Standard Layout Types as long as the memcpy doesn't break what the defined copy-operations and destructor of the Standard Layout type do. (Note that a even TC class is allowed to have a non-trivial constructor.) The standard only explicitly calls out TC types wrt. this, however.

A recent draft quote (N3797):

3.9 Types

...

2 For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value. [ Example:

  #define N sizeof(T)
char buf[N];        T obj; // obj initialized to its original value
std::memcpy(buf, &obj, N); // between these two calls to std::memcpy,
// obj might be modified
std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type
// holds its original value

—end example ]

3 For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1. [ Example:

T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p

—end example ]

The standard here talks about trivially copyable types, but as was observed by @dyp above, there are also standard layout types that do not, as far as I can see, necessarily overlap with Trivially Copyable types.

The standard says:

1.8 The C++ object model

(...)

5 (...) An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

So what I see here is that:

  • The standard says nothing about non Trivially Copyable types wrt. memcpy. (as already mentioned several times here)
  • The standard has a separate concept for Standard Layout types that occupy contiguous storage.
  • The standard does not explicitly allow nor disallow using memcpy on objects of Standard Layout that are not Trivially Copyable.

So it does not seem to be explicitly called out UB, but it certainly also isn't what is referred to as 未指定的行为, so one could conclude what @underscore_d did in the comment to the accepted answer:

(...) You can't just say "well, it wasn't explicitly called out as UB, therefore it's defined behaviour!", which is what this thread seems to amount to. N3797 3.9 points 2~3 do not define what memcpy does for non-trivially-copyable objects, so (...) [t]hat's pretty much functionally equivalent to UB in my eyes as both are useless for writing reliable, i.e. portable code

I personally would conclude that it amounts to UB as far as portability goes (oh, those optimizers), but I think that with some hedging and knowledge of the concrete implementation, one can get away with it. (Just make sure it's worth the trouble.)


Side Note: I also think that the standard really should explicitly incorporate Standard Layout type semantics into the whole memcpy mess, because it's a valid and useful usecase to do bitwise copy of non Trivially Copyable objects, but that's beside the point here.

Link: Can I use memcpy to write to multiple adjacent Standard Layout sub-objects?

First, note that it is unquestionable that all memory for mutable C/C++ objects has to be un-typed, un-specialized, usable for any mutable object. (I guess the memory for global const variables could hypothetically be typed, there is just no point with such hyper complication for such tiny corner case.) Unlike Java, C++ has no typed allocation of a dynamic object: new Class(args) in Java is a typed object creation: creation an object of a well defined type, that might live in typed memory. On the other hand, the C++ expression new Class(args) is just a thin typing wrapper around type-less memory allocation, equivalent with new (operator new(sizeof(Class)) Class(args): the object is created in "neutral memory". Changing that would mean changing a very big part of C++.

Forbidding the bit copy operation (whether done by memcpy or the equivalent user defined byte by byte copy) on some type gives a lot freedom to the implementation for polymorphic classes (those with virtual functions), and other so called "virtual classes" (not a standard term), that is the classes that use the virtual keyword.

The implementation of polymorphic classes could use a global associative map of addresses which associate the address of a polymorphic object and its virtual functions. I believe that was an option seriously considered during the design of the first iterations C++ language (or even "C with classes"). That map of polymorphic objects might use special CPU features and special associative memory (such features aren't exposed to the C++ user).

Of course we know that all practical implementations of virtual functions use vtables (a constant record describing all dynamic aspects of a class) and put a vptr (vtable pointer) in each polymorphic base class subobject, as that approach is extremely simple to implement (at least for the simplest cases) and very efficient. There is no global registry of polymorphic objects in any real world implementation except possibly in debug mode (I don't know such debug mode).

The C++ standard made the lack of global registry somewhat official by saying that you can skip the destructor call when you reuse the memory of an object, as long as you don't depend on the "side effects" of that destructor call. (I believe that means that the "side effects" are user created, that is the body of the destructor, not implementation created, as automatically done to the destructor by the implementation.)

Because in practice in all implementations, the compiler just uses vptr (pointer to vtables) hidden members, and these hidden members will be copied properly bymemcpy; as if you did a plain member-wise copy of the C struct representing the polymorphic class (with all its hidden members). Bit-wise copies, or complete C struct members-wise copies (the complete C struct includes hidden members) will behave exactly as a constructor call (as done by placement new), so all you have to do it let the compiler think you might have called placement new. If you do a strongly external function call (a call to a function that cannot be inlined and whose implementation cannot be examined by the compiler, like a call to a function defined in a dynamically loaded code unit, or a system call), then the compiler will just assume that such constructors could have been called by the code it cannot examine. Thus the behavior of memcpy here is defined not by the language standard, but by the compiler ABI (Application Binary Interface). The behavior of a strongly external function call is defined by the ABI, not just by the language standard. A call to a potentially inlinable function is defined by the language as its definition can be seen (either during compiler or during link time global optimization).

So in practice, given appropriate "compiler fences" (such as a call to an external function, or just asm("")), you can memcpy classes that only use virtual functions.

Of course, you have to be allowed by the language semantic to do such placement new when you do a memcpy: you cannot willy-nilly redefine the dynamic type of an existing object and pretend you have not simply wrecked the old object. If you have a non const global, static, automatic, member subobject, array subobject, you can overwrite it and put another, unrelated object there; but if the dynamic type is different, you cannot pretend that it's still the same object or subobject:

struct A { virtual void f(); };
struct B : A { };


void test() {
A a;
if (sizeof(A) != sizeof(B)) return;
new (&a) B; // OK (assuming alignement is OK)
a.f(); // undefined
}

The change of polymorphic type of an existing object is simply not allowed: the new object has no relation with a except for the region of memory: the continuous bytes starting at &a. They have different types.

[The standard is strongly divided on whether *&a can be used (in typical flat memory machines) or (A&)(char&)a (in any case) to refer to the new object. Compiler writers are not divided: you should not do it. This a deep defect in C++, perhaps the deepest and most troubling.]

But you cannot in portable code perform bitwise copy of classes that use virtual inheritance, as some implementations implement those classes with pointers to the virtual base subobjects: these pointers that were properly initialized by the constructor of the most derived object would have their value copied by memcpy (like a plain member wise copy of the C struct representing the class with all its hidden members) and wouldn't point the subobject of the derived object!

Other ABI use address offsets to locate these base subobjects; they depend only on the type of the most derived object, like final overriders and typeid, and thus can be stored in the vtable. On these implementation, memcpy will work as guaranteed by the ABI (with the above limitation on changing the type of an existing object).

In either case, it is entirely an object representation issue, that is, an ABI issue.

Ok, lets try your code with a little example:

#include <iostream>
#include <string>
#include <string.h>


void swapMemory(std::string* ePtr1, std::string* ePtr2) {
static const int size = sizeof(*ePtr1);
char swapBuffer[size];


memcpy(swapBuffer, ePtr1, size);
memcpy(ePtr1, ePtr2, size);
memcpy(ePtr2, swapBuffer, size);
}


int main() {
std::string foo = "foo", bar = "bar";
std::cout << "foo = " << foo << ", bar = " << bar << std::endl;
swapMemory(&foo, &bar);
std::cout << "foo = " << foo << ", bar = " << bar << std::endl;
return 0;
}

On my machine, this prints the following before crashing:

foo = foo, bar = bar
foo = foo, bar = bar

Weird, eh? The swap does not seem to be performed at all. Well, the memory was swapped, but std::string uses the small-string-optimization on my machine: It stores short strings within a buffer that's part of the std::string object itself, and just points its internal data pointer at that buffer.

When swapMemory() swaps the bytes, it swaps both the pointers and the buffers. So, the pointer in the foo object now points at the storage in the bar object, which now contains the string "foo". Two levels of swap make no swap.

When std::string's destructor subsequently tries to clean up, more evil happens: The data pointer does not point at the std::string's own internal buffer anymore, so the destructor deduces that that memory must have been allocated on the heap, and tries to delete it. The result on my machine is a simple crash of the program, but the C++ standard would not care if pink elephants were to appear. The behavior is totally undefined.


And that is the fundamental reason why you should not be using memcpy() on non-trivially copyable objects: You do not know whether the object contains pointers/references to its own data members, or depends on its own location in memory in any other way. If you memcpy() such an object, the basic assumption that the object cannot move around in memory is violated, and some classes like std::string do rely on this assumption. The C++ standard draws the line at the distinction between (non-)trivially copyable objects to avoid going into more, unnecessary detail about pointers and references. It only makes an exception for trivially copyable objects and says: Well, in this case you are safe. But do not blame me on the consequences should you try to memcpy() any other objects.