为什么 C 在使用条件运算符时不允许连接字符串?

下面的代码编译没有问题:

int main() {
printf("Hi" "Bye");
}

然而,这并不能编译:

int main() {
int test = 0;
printf("Hi" (test ? "Bye" : "Goodbye"));
}

为什么会这样?

10345 次浏览

According to the C Standard (5.1.1.2 Translation phases)

1 The precedence among the syntax rules of translation is specified by the following phases.6)

  1. Adjacent string literal tokens are concatenated.

And only after that

  1. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

In this construction

"Hi" (test ? "Bye" : "Goodbye")

there are no adjacent string literal tokens. So this construction is invalid.

As per the C11 standard, chapter §5.1.1.2, concatenation of adjacent string literals:

Adjacent string literal tokens are concatenated.

happens in translation phase. On the other hand:

printf("Hi" (test ? "Bye" : "Goodbye"));

involves the conditional operator, which is evaluated at run-time. So, at compile time, during the translation phase, there are no adjacent string literals present, hence the concatenation is not possible. The syntax is invalid and thus reported by your compiler.


To elaborate a bit on the why part, during the preprocessing phase, the adjacent string literals are concatenated and represented as a single string literal (token). The storage is allocated accordingly and the concatenated string literal is considered as a single entity (one string literal).

On the other hand, in case of run-time concatenation, the destination should have enough memory to hold the concatenated string literal otherwise, there will be no way that the expected concatenated output can be accessed. Now, in case of string literals, they are already allocated memory at compile-time and cannot be extended to fit in any more incoming input into or appended to the original content. In other words, there will be no way that the concatenated result can be accessed (presented) as a single string literal. So, this construct in inherently incorrect.

Just FYI, for run-time string (not literals) concatenation, we have the library function strcat() which concatenates two strings. Notice, the description mentions:

char *strcat(char * restrict s1,const char * restrict s2);

The strcat() function appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. [...]

So, we can see, the s1 is a string, not a string literal. However, as the content of s2 is not altered in any way, it can very well be a string literal.

What is the reason for that?

Your code using ternary operator conditionally chooses between two string literals. No matter condition known or unknown, this can't be evaluated at compile time, so it can't compile. Even this statement printf("Hi" (1 ? "Bye" : "Goodbye")); wouldn't compile. The reason is in depth explained in the answers above. Another possibility of making such a statement using ternary operator valid to compile, would also involve a format tag and the result of the ternary operator statement formatted as additional argument to printf. Even then, printf() printout would give an impression of "having concatenated" those strings only at, and as early as runtime.

#include <stdio.h>


int main() {
int test = 0;
printf("Hi %s\n", (test ? "Bye" : "Goodbye")); //specify format and print as result
}

String literal concatenation is performed by the preprocessor at compile-time. There is no way for this concatenation to be aware of the value of test, which is not known until the program actually executes. Therefore, these string literals cannot be concatenated.

Because the general case is that you wouldn't have a construction like this for values known at compile-time, the C standard was designed to restrict the auto-concatenation feature to the most basic case: when the literals are literally right alongside each other.

But even if it did not word this restriction in that way, or if the restriction were differently-constructed, your example would still be impossible to realise without making the concatenation a runtime process. And, for that, we have the library functions such as strcat.

Because C has no string type. String literals are compiled to char arrays, referenced by a char* pointer.

C allows adjacent literals to be combined at compile-time, as in your first example. The C compiler itself has some knowledge about strings. But this information is not present at runtime, and thus concatenation cannot happen.

During the compilation process, your first example is "translated" to:

int main() {
static const char char_ptr_1[] = {'H', 'i', 'B', 'y', 'e', '\0'};
printf(char_ptr_1);
}

Note how the two strings are combined to a single static array by the compiler, before the program ever executes.

However, your second example is "translated" to something like this:

int main() {
static const char char_ptr_1[] = {'H', 'i', '\0'};
static const char char_ptr_2[] = {'B', 'y', 'e', '\0'};
static const char char_ptr_3[] = {'G', 'o', 'o', 'd', 'b', 'y', 'e', '\0'};
int test = 0;
printf(char_ptr_1 (test ? char_ptr_2 : char_ptr_3));
}

It should be clear why this does not compile. The ternary operator ? is evaluated at runtime, not compile-time, when the "strings" no longer exist as such, but only as simple char arrays, referenced by char* pointers. Unlike adjacent string literals, adjacent char pointers are simply a syntax error.

If you really want to have both branches produce compile-time string constants to be chosen at runtime, you'll need a macro.

#include <stdio.h>
#define ccat(s, t, a, b) ((t)?(s a):(s b))


int
main ( int argc, char **argv){
printf("%s\n", ccat("hello ", argc > 2 , "y'all", "you"));
return 0;
}

In printf("Hi" "Bye"); you have two consecutive arrays of char which the compiler can make into a single array.

In printf("Hi" (test ? "Bye" : "Goodbye")); you have one array followed by a pointer to char (an array converted to a pointer to its first element). The compiler cannot merge an array and a pointer.

This does not compile because the parameter list for the printf function is

(const char *format, ...)

and

("Hi" (test ? "Bye" : "Goodbye"))

does not fit the parameter list.

gcc tries to make sense of it by imagining that

(test ? "Bye" : "Goodbye")

is a parameter list, and complains that "Hi" is not a function.

To answer the question - I would go to the definition of printf. The function printf expects const char* as argument. Any string literal such as "Hi" is a const char*; however an expression such as (test)? "str1" : "str2" is NOT a const char* because the result of such expression is found only at run-time and hence is indeterminate at compile time, a fact which duly causes the compiler to complain. On the other hand - this works perfectly well printf("hi %s", test? "yes":"no")