Why can't I define a function inside another function?

This is not a lambda function question, I know that I can assign a lambda to a variable.

What's the point of allowing us to declare, but not define a function inside code?

For example:

#include <iostream>


int main()
{
// This is illegal
// int one(int bar) { return 13 + bar; }


// This is legal, but why would I want this?
int two(int bar);


// This gets the job done but man it's complicated
class three{
int m_iBar;
public:
three(int bar):m_iBar(13 + bar){}
operator int(){return m_iBar;}
};


std::cout << three(42) << '\n';
return 0;
}

So what I want to know is why would C++ allow two which seems useless, and three which seems far more complicated, but disallow one?

EDIT:

From the answers it seems that there in-code declaration may be able to prevent namespace pollution, what I was hoping to hear though is why the ability to declare functions has been allowed but the ability to define functions has been disallowed.

32394 次浏览

Well, the answer is "historical reasons". In C you could have function declarations at block scope, and the C++ designers did not see the benefit in removing that option.

An example usage would be:

#include <iostream>


int main()
{
int func();
func();
}


int func()
{
std::cout << "Hello\n";
}

IMO this is a bad idea because it is easy to make a mistake by providing a declaration that does not match the function's real definition, leading to undefined behaviour which will not be diagnosed by the compiler.

This language feature was inherited from C, where it served some purpose in C's early days (function declaration scoping maybe?). I don't know if this feature is used much by modern C programmers and I sincerely doubt it.

So, to sum up the answer:

there is no purpose for this feature in modern C++ (that I know of, at least), it is here because of C++-to-C backward compatibility (I suppose :) ).


Thanks to the comment below:

Function prototype is scoped to the function it is declared in, so one can have a tidier global namespace - by referring to external functions/symbols without #include.

In the example you give, void two(int) is being declared as an external function, with that declaration only being valid within the scope of the main function.

That's reasonable if you only wish to make the name two available within main() so as to avoid polluting the global namespace within the current compilation unit.

Example in response to comments:

main.cpp:

int main() {
int foo();
return foo();
}

foo.cpp:

int foo() {
return 0;
}

no need for header files. compile and link with

c++ main.cpp foo.cpp

it'll compile and run, and the program will return 0 as expected.

The first one is a function definition, and it is not allowed. Obvious, wt is the usage of putting a definition of a function inside another function.

But the other twos are just declarations. Imagine you need to use int two(int bar); function inside the main method. But it is defined below the main() function, so that function declaration inside the function makes you to use that function with declarations.

The same applies to the third. Class declarations inside the function allows you to use a class inside the function without providing an appropriate header or reference.

int main()
{
// This is legal, but why would I want this?
int two(int bar);


//Call two
int x = two(7);


class three {
int m_iBar;
public:
three(int bar):m_iBar(13 + bar) {}
operator int() {return m_iBar;}
};


//Use class
three *threeObj = new three();


return 0;
}

Actually, there is one use case which is conceivably useful. If you want to make sure that a certain function is called (and your code compiles), no matter what the surrounding code declares, you can open your own block and declare the function prototype in it. (The inspiration is originally from Johannes Schaub, https://stackoverflow.com/a/929902/3150802, via TeKa, https://stackoverflow.com/a/8821992/3150802).

This may be particularily useful if you have to include headers which you don't control, or if you have a multi-line macro which may be used in unknown code.

The key is that a local declaration supersedes previous declarations in the innermost enclosing block. While that can introduce subtle bugs (and, I think, is forbidden in C#), it can be used consciously. Consider:

// somebody's header
void f();


// your code
{   int i;
int f(); // your different f()!
i = f();
// ...
}

Linking may be interesting because chances are the headers belong to a library, but I guess you can adjust the linker arguments so that f() is resolved to your function by the time that library is considered. Or you tell it to ignore duplicate symbols. Or you don't link against the library.

Specifically answering this question:

From the answers it seems that there in-code declaration may be able to prevent namespace pollution, what I was hoping to hear though is why the ability to declare functions has been allowed but the ability to define functions has been disallowed.

Because consider this code:

int main()
{
int foo() {


// Do something
return 0;
}
return 0;
}

Questions for language designers:

  1. Should foo() be available to other functions?
  2. If so, what should be its name? int main(void)::foo()?
  3. (Note that 2 would not be possible in C, the originator of C++)
  4. If we want a local function, we already have a way - make it a static member of a locally-defined class. So should we add another syntactic method of achieving the same result? Why do that? Wouldn't it increase the maintenance burden of C++ compiler developers?
  5. And so on...

You can do these things, largely because they're actually not all that difficult to do.

From the viewpoint of the compiler, having a function declaration inside another function is pretty trivial to implement. The compiler needs a mechanism to allow declarations inside of functions to handle other declarations (e.g., int x;) inside a function anyway.

It will typically have a general mechanism for parsing a declaration. For the guy writing the compiler, it doesn't really matter at all whether that mechanism is invoked when parsing code inside or outside of another function--it's just a declaration, so when it sees enough to know that what's there is a declaration, it invokes the part of the compiler that deals with declarations.

In fact, prohibiting these particular declarations inside a function would probably add extra complexity, because the compiler would then need an entirely gratuitous check to see if it's already looking at code inside a function definition and based on that decide whether to allow or prohibit this particular declaration.

That leaves the question of how a nested function is different. A nested function is different because of how it affects code generation. In languages that allow nested functions (e.g., Pascal) you normally expect that the code in the nested function has direct access to the variables of the function in which it's nested. For example:

int foo() {
int x;


int bar() {
x = 1; // Should assign to the `x` defined in `foo`.
}
}

Without local functions, the code to access local variables is fairly simple. In a typical implementation, when execution enters the function, some block of space for local variables is allocated on the stack. All the local variables are allocated in that single block, and each variable is treated as simply an offset from the beginning (or end) of the block. For example, let's consider a function something like this:

int f() {
int x;
int y;
x = 1;
y = x;
return y;
}

A compiler (assuming it didn't optimize away the extra code) might generate code for this roughly equivalent to this:

stack_pointer -= 2 * sizeof(int);      // allocate space for local variables
x_offset = 0;
y_offset = sizeof(int);


stack_pointer[x_offset] = 1;                           // x = 1;
stack_pointer[y_offset] = stack_pointer[x_offset];     // y = x;
return_location = stack_pointer[y_offset];             // return y;
stack_pointer += 2 * sizeof(int);

In particular, it has one location pointing to the beginning of the block of local variables, and all access to the local variables is as offsets from that location.

With nested functions, that's no longer the case--instead, a function has access not only to its own local variables, but to the variables local to all the functions in which it's nested. Instead of just having one "stack_pointer" from which it computes an offset, it needs to walk back up the stack to find the stack_pointers local to the functions in which it's nested.

Now, in a trivial case that's not all that terrible either--if bar is nested inside of foo, then bar can just look up the stack at the previous stack pointer to access foo's variables. Right?

Wrong! Well, there are cases where this can be true, but it's not necessarily the case. In particular, bar could be recursive, in which case a given invocation of bar might have to look some nearly arbitrary number of levels back up the stack to find the variables of the surrounding function. Generally speaking, you need to do one of two things: either you put some extra data on the stack, so it can search back up the stack at run-time to find its surrounding function's stack frame, or else you effectively pass a pointer to the surrounding function's stack frame as a hidden parameter to the nested function. Oh, but there's not necessarily just one surrounding function either--if you can nest functions, you can probably nest them (more or less) arbitrarily deep, so you need to be ready to pass an arbitrary number of hidden parameters. That means you typically end up with something like a linked list of stack frames to surrounding functions, and access to variables of surrounding functions is done by walking that linked list to find its stack pointer, then accessing an offset from that stack pointer.

That, however, means that access to a "local" variable may not be a trivial matter. Finding the correct stack frame to access the variable can be non-trivial, so access to variables of surrounding functions is also (at least usually) slower than access to truly local variables. And, of course, the compiler has to generate code to find the right stack frames, access variables via any of an arbitrary number of stack frames, and so on.

This is the complexity that C was avoiding by prohibiting nested functions. Now, it's certainly true that a current C++ compiler is a rather different sort of beast from a 1970's vintage C compiler. With things like multiple, virtual inheritance, a C++ compiler has to deal with things on this same general nature in any case (i.e., finding the location of a base-class variable in such cases can be non-trivial as well). On a percentage basis, supporting nested functions wouldn't add much complexity to a current C++ compiler (and some, such as gcc, already support them).

At the same time, it rarely adds much utility either. In particular, if you want to define something that acts like a function inside of a function, you can use a lambda expression. What this actually creates is an object (i.e., an instance of some class) that overloads the function call operator (operator()) but it still gives function-like capabilities. It makes capturing (or not) data from the surrounding context more explicit though, which allows it to use existing mechanisms rather than inventing a whole new mechanism and set of rules for its use.

Bottom line: even though it might initially seem like nested declarations are hard and nested functions are trivial, more or less the opposite is true: nested functions are actually much more complex to support than nested declarations.

It is not obvious why one is not allowed; nested functions were proposed a long time ago in N0295 which says:

We discuss the introduction of nested functions into C++. Nested functions are well understood and their introduction requires little effort from either compiler vendors, programmers, or the committee. Nested functions offer significant advantages, [...]

Obviously this proposal was rejected, but since we don't have meeting minutes available online for 1993 we don't have a possible source for the rationale for this rejection.

In fact this proposal is noted in Lambda expressions and closures for C ++ as a possible alternative:

One article [Bre88] and proposal N0295 to the C ++ committee [SH93] suggest adding nested functions to C ++ . Nested functions are similar to lambda expressions, but are defined as statements within a function body, and the resulting closure cannot be used unless that function is active. These proposals also do not include adding a new type for each lambda expression, but instead implementing them more like normal functions, including allowing a special kind of function pointer to refer to them. Both of these proposals predate the addition of templates to C ++ , and so do not mention the use of nested functions in combination with generic algorithms. Also, these proposals have no way to copy local variables into a closure, and so the nested functions they produce are completely unusable outside their enclosing function

Considering we do now have lambdas we are unlikely to see nested functions since, as the paper outlines, they are alternatives for the same problem and nested functions have several limitations relative to lambdas.

As for this part of your question:

// This is legal, but why would I want this?
int two(int bar);

There are cases where this would be a useful way to call the function you want. The draft C++ standard section 3.4.1 [basic.lookup.unqual] gives us one interesting example:

namespace NS {
class T { };
void f(T);
void g(T, int);
}


NS::T parm;
void g(NS::T, float);


int main() {
f(parm); // OK: calls NS::f
extern void g(NS::T, float);
g(parm, 1); // OK: calls g(NS::T, float)
}

This is not an answer to the OP question, but rather a reply to several comments.

I disagree with these points in the comments and answers: 1 that nested declarations are allegedly harmless, and 2 that nested definitions are useless.

1 The prime counterexample for the alleged harmlessness of nested function declarations is the infamous Most Vexing Parse. IMO the spread of confusion caused by it is enough to warrant an extra rule forbidding nested declarations.

2 The 1st counterexample to the alleged uselessness of nested function definitions is frequent need to perform the same operation in several places inside exactly one function. There is an obvious workaround for this:

private:
inline void bar(int abc)
{
// Do the repeating operation
}


public:
void foo()
{
int a, b, c;
bar(a);
bar(b);
bar(c);
}

However, this solution often enough contaminates the class definition with numerous private functions, each of which is used in exactly one caller. A nested function declaration would be much cleaner.

Just wanted to point out that the GCC compiler allows you to declare functions inside functions. Read more about it here. Also with the introduction of lambdas to C++, this question is a bit obsolete now.


The ability to declare function headers inside other functions, I found useful in the following case:

void do_something(int&);


int main() {
int my_number = 10 * 10 * 10;
do_something(my_number);


return 0;
}


void do_something(int& num) {
void do_something_helper(int&); // declare helper here
do_something_helper(num);


// Do something else
}


void do_something_helper(int& num) {
num += std::abs(num - 1337);
}

What do we have here? Basically, you have a function that is supposed to be called from main, so what you do is that you forward declare it like normal. But then you realize, this function also needs another function to help it with what it's doing. So rather than declaring that helper function above main, you declare it inside the function that needs it and then it can be called from that function and that function only.

My point is, declaring function headers inside functions can be an indirect method of function encapsulation, which allows a function to hide some parts of what it's doing by delegating to some other function that only it is aware of, almost giving an illusion of a nested function.

Nested function declarations are allowed probably for 1. Forward references 2. To be able to declare a pointer to function(s) and pass around other function(s) in a limited scope.

Nested function definitions are not allowed probably due to issues like 1. Optimization 2. Recursion (enclosing and nested defined function(s)) 3. Re-entrancy 4. Concurrency and other multithread access issues.

From my limited understanding :)