为什么不可能构建一个编译器来确定一个 C + + 函数是否会改变一个特定变量的值?

我在一本书上读到这句话:

可以证明,构建一个实际上可以 确定 C + + 函数是否会更改 特定变量。

这个段落讨论了为什么编译器在检查常量时是保守的。

为什么不可能构建这样的编译器

编译器总是可以检查变量是否被重新分配,是否调用了非常量函数,或者是否将其作为非常量参数传递..。

7491 次浏览

Why is it impossible to build such a compiler?

For the same reason that you can't write a program that will determine whether any given program will terminate. This is known as the halting problem, and it's one of those things that's not computable.

To be clear, you can write a compiler that can determine that a function does change the variable in some cases, but you can't write one that reliably tells you that the function will or won't change the variable (or halt) for every possible function.

Here's an easy example:

void foo() {
if (bar() == 0) this->a = 1;
}

How can a compiler determine, just from looking at that code, whether foo will ever change a? Whether it does or doesn't depends on conditions external to the function, namely the implementation of bar. There's more than that to the proof that the halting problem isn't computable, but it's already nicely explained at the linked Wikipedia article (and in every computation theory textbook), so I'll not attempt to explain it correctly here.

I think the key word in "whether or not a C++ function will change the value of a particular variable" is "will". It is certainly possible to build a compiler that checks whether or not a C++ function is allowed to change the value of a particular variable, you cannot say with certainty that the change is going to happen:

void maybe(int& val) {
cout << "Should I change value? [Y/N] >";
string reply;
cin >> reply;
if (reply == "Y") {
val = 42;
}
}

There are multiple avenues to explaining this, one of which is the Halting Problem:

In computability theory, the halting problem can be stated as follows: "Given a description of an arbitrary computer program, decide whether the program finishes running or continues to run forever". This is equivalent to the problem of deciding, given a program and an input, whether the program will eventually halt when run with that input, or will run forever.

Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist.

If I write a program that looks like this:

do tons of complex stuff
if (condition on result of complex stuff)
{
change value of x
}
else
{
do not change value of x
}

Does the value of x change? To determine this, you would first have to determine whether the do tons of complex stuff part causes the condition to fire - or even more basic, whether it halts. That's something the compiler can't do.

As soon as a function calls another function that the compiler doesn't "see" the source of, it either has to assume that the variable is changed, or things may well go wrong further below. For example, say we have this in "foo.cpp":

 void foo(int& x)
{
ifstream f("f.dat", ifstream::binary);
f.read((char *)&x, sizeof(x));
}

and we have this in "bar.cpp":

void bar(int& x)
{
foo(x);
}

How can the compiler "know" that x is not changing (or IS changing, more appropriately) in bar?

I'm sure we can come up with something more complex, if this isn't complex enough.

It is impossible in general to for the compiler to determine if the variable will be changed, as have been pointed out.

When checking const-ness, the question of interest seems to be if the variable can be changed by a function. Even this is hard in languages that support pointers. You can't control what other code does with a pointer, it could even be read from an external source (though unlikely). In languages that restrict access to memory, these types of guarantees can be possible and allows for more aggressive optimization than C++ does.

Imagine such compiler exists. Let's also assume that for convenience it provides a library function that returns 1 if the passed function modifies a given variable and 0 when the function doesn't. Then what should this program print?

int variable = 0;


void f() {
if (modifies_variable(f, variable)) {
/* do nothing */
} else {
/* modify variable */
variable = 1;
}
}


int main(int argc, char **argv) {
if (modifies_variable(f, variable)) {
printf("Modifies variable\n");
} else {
printf("Does not modify variable\n");
}


return 0;
}

I don't think it's necessary to invoke the halting problem to explain that you can't algorithmically know at compile time whether a given function will modify a certain variable or not.

Instead, it's sufficient to point out that a function's behavior often depends on run-time conditions, which the compiler can't know about in advance. E.g.

int y;


int main(int argc, char *argv[]) {
if (argc > 2) y++;
}

How could the compiler predict with certainty whether y will be modified?

Even if a variable is declared const, doesn't mean some badly written code can overwrite it.

//   g++ -o foo foo.cc


#include <iostream>
void const_func(const int&a, int* b)
{
b[0] = 2;
b[1] = 2;
}


int main() {
int a = 1;
int b = 3;


std::cout << a << std::endl;
const_func(a,&b);
std::cout << a << std::endl;
}

output:

1
2

Really surprised that there isn't an answer that using the halting problem directly! There's a very straightforward reduction from this problem to the halting problem.

Imagine that the compiler could tell whether or not a function changed the value of a variable. Then it would certainly be able to tell whether the following function changes the value of y or not, assuming that the value of x can be tracked in all the calls throughout the rest of the program:

foo(int x){
if(x)
y=1;
}

Now, for any program we like, let's rewrite it as:

int y;
main(){
int x;
...
run the program normally
...
foo(x);
}

Notice that, if, and only if, our program changes the value of y, does it then terminate - foo() is the last thing it does before exiting. This means we've solved the halting problem!

What the above reduction shows us is that the problem of determining whether a variable's value changes is at least as hard as the halting problem. The halting problem is known to be incomputable, so this one must be also.

To make the question more specific I suggest the following set of constraints may have been what the author of the book may have had in mind:

  1. Assume the compiler is examining the behavior of a specific function with respect to const-ness of a variable. For correctness a compiler would have to assume (because of aliasing as explained below) if the function called another function the variable is changed, so assumption #1 only applies to code fragments that don't make function calls.
  2. Assume the variable isn't modified by an asynchronous or concurrent activity.
  3. Assume the compiler is only determining if the variable can be modified, not whether it will be modified. In other words the compiler is only performing static analysis.
  4. Assume the compiler is only considering correctly functioning code (not considering array overruns/underruns, bad pointers, etc.)

In the context of compiler design, I think assumptions 1,3,4 make perfect sense in the view of a compiler writer in the context of code gen correctness and/or code optimization. Assumption 2 makes sense in the absence of the volatile keyword. And these assumptions also focus the question enough to make judging a proposed answer much more definitive :-)

Given those assumptions, a key reason why const-ness can't be assumed is due to variable aliasing. The compiler can't know whether another variable points to the const variable. Aliasing could be due to another function in the same compilation unit, in which case the compiler could look across functions and use a call tree to statically determine that aliasing could occur. But if the aliasing is due to a library or other foreign code, then the compiler has no way to know upon function entry whether variables are aliased.

You could argue that if a variable/argument is marked const then it shouldn't be subject to change via aliasing, but for a compiler writer that's pretty risky. It can even be risky for a human programmer to declare a variable const as part of, say a large project where he doesn't know the behavior of the whole system, or the OS, or a library, to really know a variable won't change.

Don't confuse "will or will not modify a variable given these inputs" for "has an execution path which modifies a variable."

The former is called opaque predicate determination, and is trivially impossible to decide - aside from reduction from the halting problem, you could just point out the inputs might come from an unknown source (eg. the user). This is true of all languages, not just C++.

The latter statement, however, can be determined by looking at the parse tree, which is something that all optimizing compilers do. The reason they do is that pure functions (and referentially transparent functions, for some definition of referentially transparent) have all sorts of nice optimizations that can be applied, like being easily inlinable or having their values determined at compile-time; but to know if a function is pure, we need to know if it can ever modify a variable.

So, what appears to be a surprising statement about C++ is actually a trivial statement about all languages.

It can be done and compilers are doing it all the time for some functions, this is for instance a trivial optimisation for simple inline accessors or many pure functions.

What is impossible is to know it in the general case.

Whenever there is a system call or a function call coming from another module, or a call to a potentially overriden method, anything could happen, included hostile takeover from some hacker's use of a stack overflow to change an unrelated variable.

However you should use const, avoid globals, prefer references to pointers, avoid reusing variables for unrelated tasks, etc. that will makes the compiler's life easier when performing aggressive optimisations.

To expand on my comments, that book's text is unclear which obfuscates the issue.

As I commented, that book is trying to say, "let's get an infinite number of monkeys to write every conceivable C++ function which could ever be written. There will be cases where if we pick a variable that (some particular function the monkeys wrote) uses, we can't work out whether the function will change that variable."

Of course for some (even many) functions in any given application, this can be determined by the compiler, and very easily. But not for all (or necessarily most).

This function can be easily so analysed:

static int global;


void foo()
{
}

"foo" clearly does not modify "global". It doesn't modify anything at all, and a compiler can work this out very easily.

This function cannot be so analysed:

static int global;


int foo()
{
if ((rand() % 100) > 50)
{
global = 1;
}
return 1;

Since "foo"'s actions depends on a value which can change at runtime, it patently cannot be determined at compile time whether it will modify "global".

This whole concept is far simpler to understand than computer scientists make it out to be. If the function can do something different based on things can change at runtime, then you can't work out what it'll do until it runs, and each time it runs it may do something different. Whether it's provably impossible or not, it's obviously impossible.