是什么使得指针的这种使用不可预测?

我目前正在学习指针,我的教授提供了这段代码作为例子:

//We cannot predict the behavior of this program!


#include <iostream>
using namespace std;


int main()
{
char * s = "My String";
char s2[] = {'a', 'b', 'c', '\0'};


cout << s2 << endl;


return 0;
}

他在评论中写道,我们无法预测程序的行为。但是到底是什么让它变得不可预测呢?我觉得没什么不对。

7109 次浏览

The only slightly wrong thing that I see with this program is that you're not supposed to assign a string literal to a mutable char pointer, though this is often accepted as a compiler extension.

Otherwise, this program appears well-defined to me:

  • The rules that dictate how character arrays become character pointers when passed as parameters (such as with cout << s2) are well-defined.
  • The array is null-terminated, which is a condition for operator<< with a char* (or a const char*).
  • #include <iostream> includes <ostream>, which in turn defines operator<<(ostream&, const char*), so everything appears to be in place.

The behaviour of the program is non-existent, because it is ill-formed.

char* s = "My String";

This is illegal. Prior to 2011, it had been deprecated for 12 years.

The correct line is:

const char* s = "My String";

Other than that, the program is fine. Your professor should drink less whiskey!

The answer is: it depends on what C++ standard you're compiling against. All the code is perfectly well-formed across all standards‡ with the exception of this line:

char * s = "My String";

Now, the string literal has type const char[10] and we're trying to initialize a non-const pointer to it. For all other types other than the char family of string literals, such an initialization was always illegal. For example:

const int arr[] = {1};
int *p = arr; // nope!

However, in pre-C++11, for string literals, there was an exception in §4.2/2:

A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; [...]. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ]

So in C++03, the code is perfectly fine (though deprecated), and has clear, predictable behavior.

In C++11, that block does not exist - there is no such exception for string literals converted to char*, and so the code is just as ill-formed as the int* example I just provided. The compiler is obligated to issue a diagnostic, and ideally in cases such as this that are clear violations of the C++ type system, we would expect a good compiler to not just be conforming in this regard (e.g. by issuing a warning) but to fail outright.

The code should ideally not compile - but does on both gcc and clang (I assume because there's probably lots of code out there that would be broken with little gain, despite this type system hole being deprecated for over a decade). The code is ill-formed, and thus it does not make sense to reason about what the behavior of the code might be. But considering this specific case and the history of it being previously allowed, I do not believe it to be an unreasonable stretch to interpret the resulting code as if it were an implicit const_cast, something like:

const int arr[] = {1};
int *p = const_cast<int*>(arr); // OK, technically

With that, the rest of the program is perfectly fine, as you never actually touch s again. Reading a created-const object via a non-const pointer is perfectly OK. Writing a created-const object via such a pointer is undefined behavior:

std::cout << *p; // fine, prints 1
*p = 5;          // will compile, but undefined behavior, which
// certainly qualifies as "unpredictable"

As there is no modification via s anywhere in your code, the program is fine in C++03, should fail to compile in C++11 but does anyway - and given that the compilers allow it, there's still no undefined behavior in it†. With allowances that the compilers are still [incorrectly] interpreting the C++03 rules, I see nothing that would lead to "unpredictable" behavior. Write to s though, and all bets are off. In both C++03 and C++11.


†Though, again, by definition ill-formed code yields no expectation of reasonable behavior
‡Except not, see Matt McNabb's answer

As others have noted, the code is illegitimate under C++11, although it was valid under earlier versions. Consequently, a compiler for C++11 is required to issue at least one diagnostic, but behavior of the compiler or the remainder of the build system is unspecified beyond that. Nothing in the Standard would forbid a compiler from exiting abruptly in response to an error, leaving a partially-written object file which a linker might think was valid, yielding a broken executable.

Although a good compiler should always ensure before it exits that any object file it is expected to have produced will be either valid, non-existent, or recognizable as invalid, such issues fall outside the jurisdiction of the Standard. While there have historically been (and may still be) some platforms where a failed compilation can result in legitimate-appearing executable files that crash in arbitrary fashion when loaded (and I've had to work with systems where link errors often had such behavior), I would not say that the consequences of syntax errors are generally unpredictable. On a good system, an attempted build will generally either produce an executable with a compiler's best effort at code generation, or won't produce an executable at all. Some systems will leave behind the old executable after a failed build, since in some cases being able to run the last successful build may be useful, but that can also lead to confusion.

My personal preference would be for disk-based systems to to rename the output file, to allow for the rare occasions when that executable would be useful while avoiding the confusion that can result from mistakenly believing one is running new code, and for embedded-programming systems to allow a programmer to specify for each project a program that should be loaded if a valid executable is not available under the normal name [ideally something which which safely indicates the lack of a useable program]. An embedded-systems tool-set would generally have no way of knowing what such a program should do, but in many cases someone writing "real" code for a system will have access to some hardware-test code that could easily be adapted to the purpose. I don't know that I've seen the renaming behavior, however, and I know that I haven't seen the indicated programming behavior.

Other answers have covered that this program is ill-formed in C++11 due to the assignment of a const char array to a char *.

However the program was ill-formed prior to C++11 also.

The operator<< overloads are in <ostream>. The requirement for iostream to include ostream was added in C++11.

Historically, most implementations had iostream include ostream anyway, perhaps for ease of implementation or perhaps in order to provide a better QoI.

But it would be conforming for iostream to only define the ostream class without defining the operator<< overloads.

You can't predict the behaviour of the compiler, for reasons noted above. (It should fail to compile, but may not.)

If compilation succeeds, then the behaviour is well-defined. You certainly can predict the behaviour of the program.

If it fails to compile, there is no program. In a compiled language, the program is the executable, not the source code. If you don't have an executable, you don't have a program, and you can't talk about behaviour of something that doesn't exist.

So I'd say your prof's statement is wrong. You can't predict the behaviour of the compiler when faced with this code, but that's distinct from the behaviour of the program. So if he's going to pick nits, he'd better make sure he's right. Or, of course, you might have misquoted him and the mistake is in your translation of what he said.