Malloc VS new ——不同的填充物

我正在为我们使用 MPI 进行高性能计算(10 ^ 5-10 ^ 6核)的项目审查其他人的 C + + 代码。该代码旨在允许(潜在的)不同机器在不同的体系结构上进行通信。他写了一条评论,内容大致如下:

我们通常使用 newdelete,但这里我使用的是 mallocfree。这是必要的,因为在使用 new时,一些编译器会以不同的方式填充数据,从而导致在不同平台之间传输数据时出现错误。malloc不会发生这种情况。

这不符合我从标准 newmalloc问题中了解到的任何东西。

What is the difference between new/delete and malloc/free? hints at the idea that the compiler could calculate the size of an object differently (but then why does that differ from using sizeof?).

Malloc & place new vs new 是一个相当受欢迎的问题,但是只讨论使用构造函数的 new,而 malloc不使用构造函数,这与此无关。

Malloc 如何理解对齐? 表示内存保证与 newmalloc正确对齐,这正是我之前所想的。

我的猜测是,他在过去的某个时候误诊了自己的错误,并推断出 newmalloc给予不同数量的填充物,我认为这可能是不正确的。但我无法在谷歌或以前的任何问题中找到答案。

帮帮我,StackOverflow,你是我唯一的希望!

7102 次浏览

I think you are right. Padding is done by the compiler not new or malloc. Padding considerations would apply even if you declared an array or struct without using new or malloc at all. In any case while I can see how different implementations of new and malloc could cause problems when porting code between platforms, I completely fail to see how they could cause problems transferring data between platforms.

Your colleague may have had new[]/delete[]'s magic cookie in mind (this is the information the implementation uses when deleting an array). However, this would not have been a problem if the allocation beginning at the address returned by new[] were used (as opposed to the allocator's).

Packing seems more probable. Variations in ABIs could (for example) result in a different number of trailing bytes added at the end a structure (this is influenced by alignment, also consider arrays). With malloc, the position of a structure could be specified and thus more easily portable to a foreign ABI. These variations are normally prevented by specifying alignment and packing of transfer structures.

The layout of an object can't depend on whether it was allocated using malloc or new. They both return the same kind of pointer, and when you pass this pointer to other functions they won't know how the object was allocated. sizeof *ptr is just dependent on the declaration of ptr, not how it was assigned.

IIRC there's one picky point. malloc is guaranteed to return an address aligned for any standard type. ::operator new(n) is only guaranteed to return an address aligned for any standard type no larger than n, and if T isn't a character type then new T[n] is only required to return an address aligned for T.

But this is only relevant when you're playing implementation-specific tricks like using the bottom few bits of a pointer to store flags, or otherwise relying on the address to have more alignment than it strictly needs.

It doesn't affect padding within the object, which necessarily has exactly the same layout regardless of how you allocated the memory it occupies. So it's hard to see how the difference could result in errors transferring data.

Is there any sign what the author of that comment thinks about objects on the stack or in globals, whether in his opinion they're "padded like malloc" or "padded like new"? That might give clues to where the idea came from.

Maybe he's confused, but maybe the code he's talking about is more than a straight difference between malloc(sizeof(Foo) * n) vs new Foo[n]. Maybe it's more like:

malloc((sizeof(int) + sizeof(char)) * n);

vs.

struct Foo { int a; char b; }
new Foo[n];

That is, maybe he's saying "I use malloc", but means "I manually pack the data into unaligned locations instead of using a struct". Actually malloc is not needed in order to manually pack the struct, but failing to realize that is a lesser degree of confusion. It is necessary to define the data layout sent over the wire. Different implementations will pad the data differently when the struct is used.

This is my wild guess of where this thing is coming from. As you mentioned, problem is with data transmission over MPI.

Personally, for my complicated data structures that I want to send/receive over MPI, I always implement serialization/deserialization methods that pack/unpack the whole thing into/from an array of chars. Now, due to padding we know that that size of the structure could be larger than the size of its members and thus one also needs to calculate the unpadded size of the data structure so that we know how many bytes are being sent/received.

For instance if you want to send/receive std::vector<Foo> A over MPI with the said technique, it is wrong to assume the size of resulting array of chars is A.size()*sizeof(Foo) in general. In other words, each class that implements serialize/deserialize methods, should also implement a method that reports the size of the array (or better yet store the array in a container). This might become the reason behind a bug. One way or another, however, that has nothing to do with new vs malloc as pointed out in this thread.

When I want to control the layout of my plain old data structure, with MS Visual compilers I use #pragma pack(1). I suppose such a precompiler directive is supported for most compilers, like for example gcc.

This has the consequence of aligning all fields of the structures one behind the other, without empty spaces.

If the platform on the other end does the same ( i.e. compiled its data exchange structure with a padding of 1), then the data retrieved on both side justs fits well. Thus I have never had to to play with malloc in C++.

At worst I would have considered overloading the new operator so as it performs some tricky things, rather than using malloc directly in C++.

In c++: newkeyword is used to allocate some particular bytes of memory with respect to some data-structure. For example, you have defined some class or structure and you want to allocate memory for its object.

myclass *my = new myclass();

or

int *i = new int(2);

But in all cases you need the defined datatype (class, struct, union, int, char etc...) and only that bytes of memory will be allocated which is required for its object/variable. (ie; multiples of that datatype).

But in case of malloc() method, you can allocate any bytes of memory and you don't need to specify the data type at all times. Here you can observe it in few possibilities of malloc():

void *v = malloc(23);

or

void *x = malloc(sizeof(int) * 23);

or

char *c = (char*)malloc(sizeof(char)*35);

malloc is a type of function and new is a type of data type in c++ in c++, if we use malloc than we must and should use typecast otherwise compiler give you error and if we use new data type for allocation of memory than we no need to typecast