有效使用 goto 在 C 中进行错误管理?

这个问题实际上是前段时间 programming.reddit.com 有趣的讨论的结果。它基本上可以归结为以下代码:

int foo(int bar)
{
int return_value = 0;
if (!do_something( bar )) {
goto error_1;
}
if (!init_stuff( bar )) {
goto error_2;
}
if (!prepare_stuff( bar )) {
goto error_3;
}
return_value = do_the_thing( bar );
error_3:
cleanup_3();
error_2:
cleanup_2();
error_1:
cleanup_1();
return return_value;
}

在这里使用 goto似乎是最好的方法,产生了所有可能性中最干净和最有效的代码,或者至少在我看来是这样。引用史蒂夫 · 麦康奈尔在 代码完成中的话:

Goto 在日常工作中很有用 分配资源,执行 这些资源的运作,以及 然后释放资源 Goto 你可以在一个区域清理 的代码。后藤减少了 你忘记的可能性 释放每个地方的资源 你检测到一个错误。

这种方法的另一个支持来自 Linux 设备驱动程序书,在 这部分

你觉得怎么样?这种情况是否是 C 语言中 goto的有效用法?您是否更喜欢生成更复杂和/或效率更低但避免使用 goto的其他方法?

33541 次浏览

The problem with the goto keyword is mostly misunderstood. It is not plain-evil. You just need to be aware of the extra control paths that you create with every goto. It becomes difficult to reason about your code and hence its validity.

FWIW, if you look up developer.apple.com tutorials, they take the goto approach to error handling.

We do not use gotos. A higher importance is laid on return values. Exception handling is done via setjmp/longjmp -- whatever little you can.

There's nothing morally wrong about the goto statement any more than there is something morally wrong with (void)* pointers.

It's all in how you use the tool. In the (trivial) case you presented, a case statement can achieve the same logic, albeit with more overhead. The real question is, "what's my speed requirement?"

goto is just plain fast, especially if you're careful to make sure that it compiles to a short jump. Perfect for applications where speed is a premium. For other applications, it probably makes sense to take the overhead hit with if/else + case for maintainability.

Remember: goto doesn't kill applications, developers kill applications.

UPDATE: Here's the case example

int foo(int bar) {
int return_value = 0 ;
int failure_value = 0 ;


if (!do_something(bar)) {
failure_value = 1;
} else if (!init_stuff(bar)) {
failure_value = 2;
} else if (prepare_stuff(bar)) {
return_value = do_the_thing(bar);
cleanup_3();
}


switch (failure_value) {
case 2: cleanup_2();
case 1: cleanup_1();
default: break ;
}
}

Seems to me that cleanup_3 should do its cleanup, then call cleanup_2. Similarly, cleanup_2 should do it's cleanup, then call cleanup_1. It appears that anytime you do cleanup_[n], that cleanup_[n-1] is required, thus it should be the responsibility of the method (so that, for instance, cleanup_3 can never be called without calling cleanup_2 and possibly causing a leak.)

Given that approach, instead of gotos, you would simply call the cleanup routine, then return.

The goto approach isn't wrong or bad, though, it's just worth noting that it's not necessarily the "cleanest" approach (IMHO).

If you are looking for the optimal performance, then I suppose that the goto solution is best. I only expect it to be relevant, however, in a select few, performance critical, applications (e.g., device drivers, embedded devices, etc). Otherwise, it's a micro-optimization that has lower priority than code clarity.

As a general rule, avoiding goto is a good idea, but the abuses that were prevalent when Dijkstra first wrote 'GOTO Considered Harmful' don't even cross most people's minds as an option these days.

What you outline is a generalizable solution to the error handling problem - it is fine with me as long as it is carefully used.

Your particular example can be simplified as follows (step 1):

int foo(int bar)
{
int return_value = 0;
if (!do_something(bar)) {
goto error_1;
}
if (!init_stuff(bar)) {
goto error_2;
}
if (prepare_stuff(bar))
{
return_value = do_the_thing(bar);
cleanup_3();
}
error_2:
cleanup_2();
error_1:
cleanup_1();
return return_value;
}

Continuing the process:

int foo(int bar)
{
int return_value = 0;
if (do_something(bar))
{
if (init_stuff(bar))
{
if (prepare_stuff(bar))
{
return_value = do_the_thing(bar);
cleanup_3();
}
cleanup_2();
}
cleanup_1();
}
return return_value;
}

This is, I believe, equivalent to the original code. This looks particularly clean since the original code was itself very clean and well organized. Often, the code fragments are not as tidy as that (though I'd accept an argument that they should be); for example, there is frequently more state to pass to the initialization (setup) routines than shown, and therefore more state to pass to the cleanup routines too.

GOTO is useful. It's something your processor can do and this is why you should have access to it.

Sometimes you want to add a little something to your function and single goto let's you do that easily. It can save time..

FWIF, I find the error handling idiom you gave in the question's example to be more readable and easier to understand than any of the alternatives given in the answers so far. While goto is a bad idea in general, it can be useful for error handling when done in a simple and uniform manner. In this situation, even though it's a goto, it's being used in well-defined and more or less structured manner.

I personally am a follower of the "The Power of Ten - 10 Rules for Writing Safety Critical Code".

I will include a small snippet from that text that illustrates what I believe to be a good idea about goto.


Rule: Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion.

Rationale: Simpler control flow translates into stronger capabilities for verification and often results in improved code clarity. The banishment of recursion is perhaps the biggest surprise here. Without recursion, though, we are guaranteed to have an acyclic function call graph, which can be exploited by code analyzers, and can directly help to prove that all executions that should be bounded are in fact bounded. (Note that this rule does not require that all functions have a single point of return – although this often also simplifies control flow. There are enough cases, though, where an early error return is the simpler solution.)


Banishing the use of goto seems bad but:

If the rules seem Draconian at first, bear in mind that they are meant to make it possible to check code where very literally your life may depend on its correctness: code that is used to control the airplane that you fly on, the nuclear power plant a few miles from where you live, or the spacecraft that carries astronauts into orbit. The rules act like the seat-belt in your car: initially they are perhaps a little uncomfortable, but after a while their use becomes second-nature and not using them becomes unimaginable.

I think that the question here is fallacious with respect to the given code.

Consider:

  1. do_something(), init_stuff() and prepare_stuff() appear to know if they have failed, since they return either false or nil in that case.
  2. Responsibility for setting up state appears to be the responsibility of those functions, since there is no state being set up directly in foo().

Therefore: do_something(), init_stuff() and prepare_stuff() should be doing their own cleanup. Having a separate cleanup_1() function that cleans up after do_something() breaks the philosophy of encapsulation. It's bad design.

If they did their own cleanup, then foo() becomes quite simple.

On the other hand. If foo() actually created its own state that needed to be torn down, then goto would be appropriate.

I'm surprised nobody has suggested this alternative, so even though the question has been around a while I'll add it in: one good way of addressing this issue is to use variables to keep track of the current state. This is a technique that can be used whether or not goto is used for arriving at the cleanup code. Like any coding technique, it has pros and cons, and won't be suitable for every situation, but if you're choosing a style it's worth considering - especially if you want to avoid goto without ending up with deeply nested ifs.

The basic idea is that, for every cleanup action that might need to be taken, there is a variable from whose value we can tell whether the cleanup needs doing or not.

I'll show the goto version first, because it is closer to the code in the original question.

int foo(int bar)
{
int return_value = 0;
int something_done = 0;
int stuff_inited = 0;
int stuff_prepared = 0;




/*
* Prepare
*/
if (do_something(bar)) {
something_done = 1;
} else {
goto cleanup;
}


if (init_stuff(bar)) {
stuff_inited = 1;
} else {
goto cleanup;
}


if (prepare_stuff(bar)) {
stufF_prepared = 1;
} else {
goto cleanup;
}


/*
* Do the thing
*/
return_value = do_the_thing(bar);


/*
* Clean up
*/
cleanup:
if (stuff_prepared) {
unprepare_stuff();
}


if (stuff_inited) {
uninit_stuff();
}


if (something_done) {
undo_something();
}


return return_value;
}

One advantage of this over some of the other techniques is that, if the order of the initialisation functions is changed, the correct cleanup will still happen - for instance, using the switch method described in another answer, if the order of initialisation changes, then the switch has to be very carefully edited to avoid trying to clean up something wasn't actually initialised in the first place.

Now, some might argue that this method adds a whole lot of extra variables - and indeed in this case that's true - but in practice often an existing variable already tracks, or can be made to track, the required state. For example, if the prepare_stuff() is actually a call to malloc(), or to open(), then the variable holding the returned pointer or file descriptor can be used - for example:

int fd = -1;


....


fd = open(...);
if (fd == -1) {
goto cleanup;
}


...


cleanup:


if (fd != -1) {
close(fd);
}

Now, if we additionally track the error status with a variable, we can avoid goto entirely, and still clean up correctly, without having indentation that gets deeper and deeper the more initialisation we need:

int foo(int bar)
{
int return_value = 0;
int something_done = 0;
int stuff_inited = 0;
int stuff_prepared = 0;
int oksofar = 1;




/*
* Prepare
*/
if (oksofar) {  /* NB This "if" statement is optional (it always executes) but included for consistency */
if (do_something(bar)) {
something_done = 1;
} else {
oksofar = 0;
}
}


if (oksofar) {
if (init_stuff(bar)) {
stuff_inited = 1;
} else {
oksofar = 0;
}
}


if (oksofar) {
if (prepare_stuff(bar)) {
stuff_prepared = 1;
} else {
oksofar = 0;
}
}


/*
* Do the thing
*/
if (oksofar) {
return_value = do_the_thing(bar);
}


/*
* Clean up
*/
if (stuff_prepared) {
unprepare_stuff();
}


if (stuff_inited) {
uninit_stuff();
}


if (something_done) {
undo_something();
}


return return_value;
}

Again, there are potential criticisms of this:

  • Don't all those "if"s hurt performance? No - because in the success case, you have to do all of the checks anyway (otherwise you're not checking all the error cases); and in the failure case most compilers will optimise the sequence of failing if (oksofar) checks to a single jump to the cleanup code (GCC certainly does) - and in any case, the error case is usually less critical for performance.
  • Isn't this adding yet another variable? In this case yes, but often the return_value variable can be used to play the role that oksofar is playing here. If you structure your functions to return errors in a consistent way, you can even avoid the second if in each case:

    int return_value = 0;
    
    
    if (!return_value) {
    return_value = do_something(bar);
    }
    
    
    if (!return_value) {
    return_value = init_stuff(bar);
    }
    
    
    if (!return_value) {
    return_value = prepare_stuff(bar);
    }
    

    One of the advantages of coding like that is that the consistency means that any place where the original programmer has forgotten to check the return value sticks out like a sore thumb, making it much easier to find (that one class of) bugs.

So - this is (yet) one more style that can be used to solve this problem. Used correctly it allows for very clean, consistent code - and like any technique, in the wrong hands it can end up producing code that is long-winded and confusing :-)

I prefer using technique described in the following example ...

struct lnode *insert(char *data, int len, struct lnode *list) {
struct lnode *p, *q;
uint8_t good;
struct {
uint8_t alloc_node : 1;
uint8_t alloc_str : 1;
} cleanup = { 0, 0 };


// allocate node.
p = (struct lnode *)malloc(sizeof(struct lnode));
good = cleanup.alloc_node = (p != NULL);


// good? then allocate str
if (good) {
p->str = (char *)malloc(sizeof(char)*len);
good = cleanup.alloc_str = (p->str != NULL);
}


// good? copy data
if(good) {
memcpy ( p->str, data, len );
}


// still good? insert in list
if(good) {
if(NULL == list) {
p->next = NULL;
list = p;
} else {
q = list;
while(q->next != NULL && good) {
// duplicate found--not good
good = (strcmp(q->str,p->str) != 0);
q = q->next;
}
if (good) {
p->next = q->next;
q->next = p;
}
}
}


// not-good? cleanup.
if(!good) {
if(cleanup.alloc_str)   free(p->str);
if(cleanup.alloc_node)  free(p);
}


// good? return list or else return NULL
return (good? list: NULL);

}

source: http://blog.staila.com/?p=114

We use Daynix CSteps library as another solution for the "goto problem" in init functions.
See here and here.

In general, I would regard the fact that a piece of code could be most clearly written using goto as a symptom that the program flow is likely more complicated than is generally desirable. Combining other program structures in weird ways to avoid the use of goto would attempt to treat the symptom, rather than the disease. Your particular example might not be overly difficult to implement without goto:

do {
.. set up thing1 that will need cleanup only in case of early exit
if (error) break;
do
{
.. set up thing2 that will need cleanup in case of early exit
if (error) break;
// ***** SEE TEXT REGARDING THIS LINE
} while(0);
.. cleanup thing2;
} while(0);
.. cleanup thing1;

but if the cleanup was only supposed to happen when the function failed, the goto case could be handled by putting a return just before the first target label. The above code would require adding a return at the line marked with *****.

In the "cleanup even in normal case" scenario, I would regard the use of goto as being clearer than the do/while(0) constructs, among other things because the target labels themselves practically cry out "LOOK AT ME" far moreso than the break and do/while(0) constructs. For the "cleanup only if error" case, the return statement ends up having to be in just about the worst possible place from a readability standpoint (return statements should generally be either at the beginning of a function, or else at what "looks like" the end); having a return just before a target label meets that qualification much more readily than having one just before the end of a "loop".

BTW, one scenario where I sometimes use goto for error-handling is within a switch statement, when the code for multiple cases shares the same error code. Even though my compiler would often be smart enough to recognize that multiple cases end with the same code, I think it's clearer to say:

REPARSE_PACKET:
switch(packet[0])
{
case PKT_THIS_OPERATION:
if (problem condition)
goto PACKET_ERROR;
... handle THIS_OPERATION
break;
case PKT_THAT_OPERATION:
if (problem condition)
goto PACKET_ERROR;
... handle THAT_OPERATION
break;
...
case PKT_PROCESS_CONDITIONALLY
if (packet_length < 9)
goto PACKET_ERROR;
if (packet_condition involving packet[4])
{
packet_length -= 5;
memmove(packet, packet+5, packet_length);
goto REPARSE_PACKET;
}
else
{
packet[0] = PKT_CONDITION_SKIPPED;
packet[4] = packet_length;
packet_length = 5;
packet_status = READY_TO_SEND;
}
break;
...
default:
{
PACKET_ERROR:
packet_error_count++;
packet_length = 4;
packet[0] = PKT_ERROR;
packet_status = READY_TO_SEND;
break;
}
}

Although one could replace the goto statements with {handle_error(); break;}, and although one could use a do/while(0) loop along with continue to process the wrapped conditional-execute packet, I don't really think that's any clearer than using a goto. Further, while it might be possible to copy out the code from PACKET_ERROR everywhere the goto PACKET_ERROR is used, and while a compiler might write out the duplicated code once and replace most occurrences with a jump to that shared copy, the using the goto makes it easier to notice places which set the packet up a little differently (e.g. if the "execute conditionally" instruction decides not to execute).

I agree that the goto cleanup in reverse order given in the question is the cleanest way of cleaning things up in most functions. But I also wanted to point out that sometimes, you want your function to clean up anyway. In these cases I use the following variant if if ( 0 ) { label: } idiom to go to the right point of the cleaning up process:

int decode ( char * path_in , char * path_out )
{
FILE * in , * out ;
code c ;
int len ;
int res = 0  ;
if ( path_in == NULL )
in = stdin ;
else
{
if ( ( in = fopen ( path_in , "r" ) ) == NULL )
goto error_open_file_in ;
}
if ( path_out == NULL )
out = stdout ;
else
{
if ( ( out = fopen ( path_out , "w" ) ) == NULL )
goto error_open_file_out ;
}


if( read_code ( in , & c , & longueur ) )
goto error_code_construction ;


if ( decode_h ( in , c , out , longueur ) )
goto error_decode ;


if ( 0 ) { error_decode: res = 1 ;}
free_code ( c ) ;
if ( 0 ) { error_code_construction: res = 1 ; }
if ( out != stdout ) fclose ( stdout ) ;
if ( 0 ) { error_open_file_out: res = 1 ; }
if ( in != stdin ) fclose ( in ) ;
if ( 0 ) { error_open_file_in: res = 1 ; }
return res ;
}

Here's what I've preferred:

bool do_something(void **ptr1, void **ptr2)
{
if (!ptr1 || !ptr2) {
err("Missing arguments");
return false;
}
bool ret = false;


//Pointers must be initialized as NULL
void *some_pointer = NULL, *another_pointer = NULL;


if (allocate_some_stuff(&some_pointer) != STUFF_OK) {
err("allocate_some_stuff step1 failed, abort");
goto out;
}
if (allocate_some_stuff(&another_pointer) != STUFF_OK) {
err("allocate_some_stuff step 2 failed, abort");
goto out;
}


void *some_temporary_malloc = malloc(1000);


//Do something with the data here
info("do_something OK");


ret = true;


// Assign outputs only on success so we don't end up with
// dangling pointers
*ptr1 = some_pointer;
*ptr2 = another_pointer;
out:
if (!ret) {
//We are returning an error, clean up everything
//deallocate_some_stuff is a NO-OP if pointer is NULL
deallocate_some_stuff(some_pointer);
deallocate_some_stuff(another_pointer);
}
//this needs to be freed every time
free(some_temporary_malloc);
return ret;
}

Old discussion, however.... what about using "arrow anti pattern" and encapsulate later every nested level in a static inline function? The code look clean, it is optimal (when optimisations are enabled), and no goto is used. In short, divide and conquer. Below an example:

static inline int foo_2(int bar)
{
int return_value = 0;
if ( prepare_stuff( bar ) ) {
return_value = do_the_thing( bar );
}
cleanup_3();
return return_value;
}


static inline int foo_1(int bar)
{
int return_value = 0;
if ( init_stuff( bar ) ) {
return_value = foo_2(bar);
}
cleanup_2();
return return_value;
}


int foo(int bar)
{
int return_value = 0;
if (do_something(bar)) {
return_value = foo_1(bar);
}
cleanup_1();
return return_value;
}

In terms of space, we are creating three times the variable in the stack, which is not good, but this disappear when compiling with -O2 removing the variable from the stack and using a register in this simple example. What I got from the above block with gcc -S -O2 test.c was the below:

    .section    __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 13
.globl  _foo                    ## -- Begin function foo
.p2align    4, 0x90
_foo:                                   ## @foo
.cfi_startproc
## %bb.0:
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq    %rsp, %rbp
.cfi_def_cfa_register %rbp
pushq   %r14
pushq   %rbx
.cfi_offset %rbx, -32
.cfi_offset %r14, -24
movl    %edi, %ebx
xorl    %r14d, %r14d
xorl    %eax, %eax
callq   _do_something
testl   %eax, %eax
je  LBB0_6
## %bb.1:
xorl    %r14d, %r14d
xorl    %eax, %eax
movl    %ebx, %edi
callq   _init_stuff
testl   %eax, %eax
je  LBB0_5
## %bb.2:
xorl    %r14d, %r14d
xorl    %eax, %eax
movl    %ebx, %edi
callq   _prepare_stuff
testl   %eax, %eax
je  LBB0_4
## %bb.3:
xorl    %eax, %eax
movl    %ebx, %edi
callq   _do_the_thing
movl    %eax, %r14d
LBB0_4:
xorl    %eax, %eax
callq   _cleanup_3
LBB0_5:
xorl    %eax, %eax
callq   _cleanup_2
LBB0_6:
xorl    %eax, %eax
callq   _cleanup_1
movl    %r14d, %eax
popq    %rbx
popq    %r14
popq    %rbp
retq
.cfi_endproc
## -- End function


.subsections_via_symbols

Yes, its valid and the best practice for exceptions in C. All error handling mechanism of any language just jumps from error to handle like goto to label. But consider placing the label after goto in execution flow and in the same scope.