哪个更快: if (bool)还是 if (int) ？

小开

Compiling with -03 gives the following for me:

f:

    pushl   %ebp
movl    %esp, %ebp
cmpl    $1, 8(%ebp)
popl    %ebp
sbbl    %eax, %eax
andb    $58, %al
addl    $99, %eax
ret

g:

    pushl   %ebp
movl    %esp, %ebp
cmpb    $1, 8(%ebp)
popl    %ebp
sbbl    %eax, %eax
andb    $58, %al
addl    $99, %eax
ret

.. so it compiles to essentially the same code, except for cmpl vs cmpb. This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.

Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.

小开

With GCC 4.5 on Linux and Windows at least, sizeof(bool) == 1. On x86 and x86_64, you can't pass in less than an general purpose register's worth to a function (whether via the stack or a register depending on the calling convention etc...).

So the code for bool, when un-optimized, actually goes to some length to extract that bool value from the argument stack (using another stack slot to save that byte). It's more complicated than just pulling a native register-sized variable.

小开

最佳答案

Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.

小开

When I compile this with a sane set of options (specifically -O3), here's what I get:

For f():

        .type   _Z1fi, @function
_Z1fi:
.LFB0:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpl    $1, %edi
sbbl    %eax, %eax
andb    $58, %al
addl    $99, %eax
ret
.cfi_endproc

For g():

        .type   _Z1gb, @function
_Z1gb:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpb    $1, %dil
sbbl    %eax, %eax
andb    $58, %al
addl    $99, %eax
ret
.cfi_endproc

They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.

edited to add the following based on your addition

The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)

When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.

小开

It will mostly depend on the compiler and the optimization. There's an interesting discussion (language agnostic) here:

Does "if ([bool] == true)" require one more step than "if ([bool])"?

Also, take a look at this post: http://www.linuxquestions.org/questions/programming-9/c-compiler-handling-of-boolean-variables-290996/

小开

At the machine level there is no such thing as bool

Very few instruction set architectures define any sort of boolean operand type, although there are often instructions that trigger an action on non-zero values. To the CPU, usually, everything is one of the scalar types or a string of them.

A given compiler and a given ABI will need to choose specific sizes for int and bool and when, like in your case, these are different sizes they may generate slightly different code, and at some levels of optimization one may be slightly faster.

Why is bool one byte on many systems?

It's safer to choose a char type for bool because someone might make a really large array of them.

Update: by "safer", I mean: for the compiler and library implementors. I'm not saying people need to reimplement the system type.

小开

Yeah, the discussion's fun. But just test it:

Test code:

#include <stdio.h>
#include <string.h>


int testi(int);
int testb(bool);
int main (int argc, char* argv[]){
bool valb;
int  vali;
int loops;
if( argc < 2 ){
return 2;
}
valb = (0 != (strcmp(argv[1], "0")));
vali = strcmp(argv[1], "0");
printf("Arg1: %s\n", argv[1]);
printf("BArg1: %i\n", valb ? 1 : 0);
printf("IArg1: %i\n", vali);
for(loops=30000000; loops>0; loops--){
//printf("%i: %i\n", loops, testb(valb=!valb));
printf("%i: %i\n", loops, testi(vali=!vali));
}
return valb;
}


int testi(int val){
if( val ){
return 1;
}
return 0;
}
int testb(bool val){
if( val ){
return 1;
}
return 0;
}

Compiled on a 64-bit Ubuntu 10.10 laptop with: g++ -O3 -o /tmp/test_i /tmp/test_i.cpp

Integer-based comparison:

sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null


real    0m8.203s
user    0m8.170s
sys 0m0.010s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null


real    0m8.056s
user    0m8.020s
sys 0m0.000s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null


real    0m8.116s
user    0m8.100s
sys 0m0.000s

Boolean test / print uncommented (and integer commented):

sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null


real    0m8.254s
user    0m8.240s
sys 0m0.000s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null


real    0m8.028s
user    0m8.000s
sys 0m0.010s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null


real    0m7.981s
user    0m7.900s
sys 0m0.050s

They're the same with 1 assignment and 2 comparisons each loop over 30 million loops. Find something else to optimize. For example, don't use strcmp unnecessarily. ;)

小开

Approaching your question in two different ways:

If you are specifically talking about C++ or any programming language that will produce assembly code for that matter, we are bound to what code the compiler will generate in ASM. We are also bound to the representation of true and false in c++. An integer will have to be stored in 32 bits, and I could simply use a byte to store the boolean expression. Asm snippets for conditional statements:

For the integer:

  mov eax,dword ptr[esp]    ;Store integer
cmp eax,0                 ;Compare to 0
je  false                 ;If int is 0, its false
;Do what has to be done when true
false:
;Do what has to be done when false

For the bool:

  mov  al,1     ;Anything that is not 0 is true
test al,1     ;See if first bit is fliped
jz   false    ;Not fliped, so it's false
;Do what has to be done when true
false:
;Do what has to be done when false

So, that's why the speed comparison is so compile dependent. In the case above, the bool would be slightly fast since cmp would imply a subtraction for setting the flags. It also contradicts with what your compiler generated.

Another approach, a much simpler one, is to look at the logic of the expression on it's own and try not to worry about how the compiler will translate your code, and I think this is a much healthier way of thinking. I still believe, ultimately, that the code being generated by the compiler is actually trying to give a truthful resolution. What I mean is that, maybe if you increase the test cases in the if statement and stick with boolean in one side and integer in another, the compiler will make it so the code generated will execute faster with boolean expressions in the machine level.

I'm considering this is a conceptual question, so I'll give a conceptual answer. This discussion reminds me of discussions I commonly have about whether or not code efficiency translates to less lines of code in assembly. It seems that this concept is generally accepted as being true. Considering that keeping track of how fast the ALU will handle each statement is not viable, the second option would be to focus on jumps and compares in assembly. When that is the case, the distinction between boolean statements or integers in the code you presented becomes rather representative. The result of an expression in C++ will return a value that will then be given a representation. In assembly, on the other hand, the jumps and comparisons will be based in numeric values regardless of what type of expression was being evaluated back at you C++ if statement. It is important on these questions to remember that purely logicical statements such as these end up with a huge computational overhead, even though a single bit would be capable of the same thing.