用 C 或 C + + 返回结构体安全吗?

我所理解的是,这是不应该做的,但是我相信我已经看到了这样做的例子(注意代码不一定在语法上是正确的,但是想法是存在的)

typedef struct{
int a,b;
}mystruct;

这是一个函数

mystruct func(int c, int d){
mystruct retval;
retval.a = c;
retval.b = d;
return retval;
}

我知道如果我们想要做这样的事情,我们应该总是返回一个指向 malloc’ed 结构的指针,但是我很肯定我已经看到过类似的例子。是这样吗?就个人而言,我总是返回一个指向 malloc’ed 结构的指针,或者只是通过引用函数来传递并修改那里的值。(因为我的理解是,一旦函数的作用域结束,用于分配结构的任何堆栈都可以被覆盖)。

让我们为这个问题添加第二部分: 这是否因编译器而异?如果是这样,那么最新版本的桌面编译器的行为是什么: gcc、 g + + 和 Visual Studio?

有什么想法吗?

76033 次浏览

It's perfectly safe, and it's not wrong to do so. Also: it does not vary by compiler.

Usually, when (like your example) your struct is not too big I would argue that this approach is even better than returning a malloc'ed structure (malloc is an expensive operation).

It's perfectly safe.

You're returning by value. What would lead to undefined behavior is if you were returning by reference.

//safe
mystruct func(int c, int d){
mystruct retval;
retval.a = c;
retval.b = d;
return retval;
}


//undefined behavior
mystruct& func(int c, int d){
mystruct retval;
retval.a = c;
retval.b = d;
return retval;
}

The behavior of your snippet is perfectly valid and defined. It doesn't vary by compiler. It's ok!

Personally I always either return a pointer to a malloc'ed struct

You shouldn't. You should avoid dynamically allocated memory when possible.

or just do a pass by reference to the function and modify the values there.

This option is perfectly valid. It's a matter of choice. In general, you do this if you want to return something else from the function, while modifying the original struct.

Because my understanding is that once the scope of the function is over, whatever stack was used to allocate the structure can be overwritten

This is wrong. I meant, it's sort of correct, but you return a copy of the structure you create inside the function. Theoretically. In practice, RVO can and probably will occur. Read up on return value optimization. This means that although retval appears to go out of scope when the function ends, it might actually be built in the calling context, to prevent the extra copy. This is an optimization the compiler is free to implement.

The lifetime of the mystruct object in your function does indeed end when you leave the function. However, you are passing the object by value in the return statement. This means that the object is copied out of the function into the calling function. The original object is gone, but the copy lives on.

It's perfectly legal, but with large structs there are two factors that need to be taken into consideration: speed and stack size.

I will also agree with sftrabbit , Life indeed ends and stack area gets cleared up but the compiler is smart enough to ensure that all the data should be retrieved in registers or someother way.

A simple example for confirmation is given below.(taken from Mingw compiler assembly)

_func:
push    ebp
mov ebp, esp
sub esp, 16
mov eax, DWORD PTR [ebp+8]
mov DWORD PTR [ebp-8], eax
mov eax, DWORD PTR [ebp+12]
mov DWORD PTR [ebp-4], eax
mov eax, DWORD PTR [ebp-8]
mov edx, DWORD PTR [ebp-4]
leave
ret

You can see that the value of b has been transmitted through edx. while the default eax contains value for a.

A structure type can be the type for the value returned by a function. It is safe because the compiler is going to create a copy of struct and return the copy not the local struct in the function.

typedef struct{
int a,b;
}mystruct;


mystruct func(int c, int d){
mystruct retval;
cout << "func:" <<&retval<< endl;
retval.a = c;
retval.b = d;
return retval;
}


int main()
{
cout << "main:" <<&(func(1,2))<< endl;




system("pause");
}

It is perfectly safe to return a struct as you have done.

Based on this statement however: Because my understanding is that once the scope of the function is over, whatever stack was used to allocate the structure can be overwritten, I would imagine only a scenario where any of the members of the structure was dynamically allocated (malloc'ed or new'ed), in which case, without RVO, the dynamically allocated members will be destroyed and the returned copy will have a member pointing to garbage.

The safety depends on how the struct itself was implemented. I just stumbled on this question while implementing something similar, and here is the potential problem.

The compiler, when returning the value does a few operations (among possibly others):

  1. Calls the copy constructor mystruct(const mystruct&) (this is a temporary variable outside the function func allocated by the compiler itself)
  2. calls the destructor ~mystruct on the variable that was allocated inside func
  3. calls mystruct::operator= if the returned value is assigned to something else with =
  4. calls the destructor ~mystruct on the temporary variable used by the compiler

Now, if mystruct is as simple as that described here all is fine, but if it has pointer (like char*) or more complicated memory management, then it all depends on how mystruct::operator=, mystruct(const mystruct&), and ~mystruct are implemented. Therefore, I suggest cautions when returning complex data structures as value.

It is not safe to return a structure. I love to do it myself, but if someone will add a copy constructor to the returned structure later, the copy constructor will be called. This might be unexpected and can break the code. This bug is very difficult to find.

I had more elaborate answer, but moderator did not like it. So, at your expence, my tip is short.

Let's add a second part to the question: Does this vary by compiler?

Indeed it does, as I discovered to my pain: http://sourceforge.net/p/mingw-w64/mailman/message/33176880/

I was using gcc on win32 (MinGW) to call COM interfaces that returned structs. Turns out that MS does it differently to GNU and so my (gcc) program crashed with a smashed stack.

It could be that MS might have the higher ground here - but all I care about is ABI compatibility between MS and GNU for building on Windows.

If it does, then what is the behavior for the latest versions of compilers for desktops: gcc, g++ and Visual Studio

You can find some messages on a Wine mailing list about how MS seems to do it.

Not only it is safe to return a struct in C (or a class in C++, where struct-s are actually class-es with default public: members), but a lot of software is doing that.

Of course, when returning a class in C++, the language specifies that some destructor or moving constructor would be called, but there are many cases where this could be optimized by the compiler.

In addition, the Linux x86-64 ABI specifies that returning a struct with two scalar (e.g. pointers, or long) values is done thru registers (%rax & %rdx) so is very fast and efficient. So for that particular case it is probably faster to return such a two-scalar fields struct than to do anything else (e.g. storing them into a pointer passed as argument).

Returning such a two-scalar field struct is then a lot faster than malloc-ing it and returning a pointer.

Note: this answer only applies to c++11 onward. There is no such thing as "C/C++", they are different languages.

No, there is no danger in returning a local object by value, and it is recommended to do so. However, I think there is an important point that is missing from all answers here. Many others have said that the struct is being either copied or directly placed using RVO. However, this is not completely correct. I will try to explain exactly which things can happen when returning a local object.

Move semantics

Since c++11, we have had rvalue references which are references to temporary objects which can be stolen from safely. As an example, std::vector has a move constructor as well as a move assignment operator. Both of these have constant complexity and simply copy the pointer to the data of the vector being moved from. I won't go into more detail about move semantics here.

Because an object created locally within a function is temporary and goes out of scope when the function returns, a returned object is never copied with c++11 onward. The move constructor is being called on the object being returned (or not, explained later). This means that if you were to return an object with an expensive copy constructor but inexpensive move constructor, like a big vector, only the ownership of the data is transferred from the local object to the returned object - which is cheap.

Note that in your specific example, there is no difference between copying and moving the object. The default move and copy constructors of your struct result in the same operations; copying two integers. However, this is at least as fast than any other solution because the whole struct fits in a 64-bit CPU register (correct me if I'm wrong, I don't know much CPU registers).

RVO and NRVO

RVO means Return Value Optimization and is one of the very few optimizations that compilers do which can have side effects. Since c++17, RVO is required. When returning an unnamed object, it is constructed directly in-place where the caller assigns the returned value. Neither the copy constructor nor the move constructor is called. Without RVO, the unnamed object would be first constructed locally, then move constructed in the returned address, then the local unnamed object is destructed.

Example where RVO is required (c++17) or likely (before c++17):

auto function(int a, int b) -> MyStruct {
// ...
return MyStruct{a, b};
}

NRVO means Named Return Value Optimization and is the same thing as RVO except it is done for a named object local to the called function. This is still not guaranteed by the standard (c++20) but many compilers still do it. Note that even with named local objects, they are at worst being moved when returned.

Conclusion

The only case where you should consider not returning by value is when you have a named, very large (as in its stack size) object. This is because NRVO is not yet guaranteed (as of c++20) and even moving the object would be slow. My recommendation, and the recommendation in the Cpp Core Guidelines is to always prefer returning objects by value (if multiple return values, use struct (or tuple)), where the only exception is when the object is expensive to move. In that case, use a non-const reference parameter.

It is NEVER a good idea to return a resource that has to be manually released from a function in c++. Never do that. At least use an std::unique_ptr, or make your own non-local or local struct with a destructor that releases its resource (RAII) and return an instance of that. It would then also be a good idea to define the move constructor and move assignment operator if the resource does not have its own move semantics (and delete copy constructor/assignment).