函数指针、闭包和 Lambda

我现在正在学习函数指针,当我在阅读 K & R 章节时,第一个想到的是,“嘿,这有点像一个结束。”我知道这个假设在某种程度上是根本错误的,在网上搜索之后,我没有找到任何关于这种比较的分析。

那么,为什么 C 风格的函数指针与闭包或 lambdas 有着根本的不同呢?据我所知,这与函数指针仍然指向一个已定义(已命名)函数,而不是匿名定义函数的做法有关。

为什么将一个函数传递给一个函数在第二种情况下(没有命名)比在第一种情况下(只是正在传递的普通的日常函数)看起来更强大?

请告诉我,我如此近距离地比较这两者是错误的,原因是什么。

37486 次浏览

In C you can't define the function inline, so you can't really create a closure. All you're doing is passing around a reference to some pre-defined method. In languages that support anonymous methods/closures, the definition of the methods are a lot more flexible.

In the simplest terms, function pointers have no scope associated with them (unless you count the global scope), whereas closures include the scope of the method that's defining them. With lambdas, you can write a method that writes a method. Closures allow you to bind "some arguments to a function and getting a lower-arity function as a result." (taken from Thomas's comment). You can't do that in C.

EDIT: Adding an example (I'm going to use Actionscript-ish syntax cause that's what's on my mind right now):

Say you have some method that takes another method as its argument, but doesn't provide a way to pass any parameters to that method when it's called? Like, say, some method that causes a delay before running the method you passed it (stupid example, but I want to keep it simple).

function runLater(f:Function):Void {
sleep(100);
f();
}

Now say you want to user runLater() to delay some processing of an object:

function objectProcessor(o:Object):Void {
/* Do something cool with the object! */
}


function process(o:Object):Void {
runLater(function() { objectProcessor(o); });
}

The function you're passing to process() isn't some staticly defined function anymore. It's dynamically generated, and is able to include references to variables that were in scope when the method was defined. So, it can access 'o' and 'objectProcessor', even though those aren't in the global scope.

I hope that made sense.

Closure = logic + environment.

For instance, consider this C# 3 method:

public Person FindPerson(IEnumerable<Person> people, string name)
{
return people.Where(person => person.Name == name);
}

The lambda expression not only encapsulates the logic ("compare the name") but also the environment, including the parameter (i.e. local variable) "name".

For more on this, have a look at my article on closures which takes you through C# 1, 2 and 3, showing how closures make things easier.

The main difference arises from the lack of lexical scoping in C.

A function pointer is just that, a pointer to a block of code. Any non-stack variable that it references is global, static or similar.

A closure, OTOH, has its own state in the form of 'outer variables', or 'upvalues'. they can be as private or shared as you want, using lexical scoping. You can create lots of closures with the same function code, but different variables instances.

A few closures can share some variables, and so can be the interface of an object (in the OOP sense). to make that in C you have to associate a structure with a table of function pointers (that's what C++ does, with a class vtable).

in short, a closure is a function pointer PLUS some state. it's a higher-level construct

A lambda is an anonymous, dynamically defined function. You just cannot do that in C... as for closures (or the convination of the two), the typical lisp example would look something along the lines of:

(defun get-counter (n-start +-number)
"Returns a function that returns a number incremented
by +-number every time it is called"
(lambda () (setf n-start (+ +-number n-start))))

In C terms, you could say that the lexical environment (the stack) of get-counter is being captured by the anonymous function, and modified internally as the following example shows:

[1]> (defun get-counter (n-start +-number)
"Returns a function that returns a number incremented
by +-number every time it is called"
(lambda () (setf n-start (+ +-number n-start))))
GET-COUNTER
[2]> (defvar x (get-counter 2 3))
X
[3]> (funcall x)
5
[4]> (funcall x)
8
[5]> (funcall x)
11
[6]> (funcall x)
14
[7]> (funcall x)
17
[8]> (funcall x)
20
[9]>

A lambda (or closure) encapsulates both the function pointer and variables. This is why, in C#, you can do:

int lessThan = 100;
Func<int, bool> lessThanTest = delegate(int i) {
return i < lessThan;
};

I used an anonymous delegate there as a closure (it's syntax is a little clearer and closer to C than the lambda equivalent), which captured lessThan (a stack variable) into the closure. When the closure is evaluated, lessThan (whose stack frame may have been destroyed) will continue to be referenced. If I change lessThan, then I change the comparison:

int lessThan = 100;
Func<int, bool> lessThanTest = delegate(int i) {
return i < lessThan;
};


lessThanTest(99); // returns true
lessThan = 10;
lessThanTest(99); // returns false

In C, this would be illegal:

BOOL (*lessThanTest)(int);
int lessThan = 100;


lessThanTest = &LessThan;


BOOL LessThan(int i) {
return i < lessThan; // compile error - lessThan is not in scope
}

though I could define a function pointer that takes 2 arguments:

int lessThan = 100;
BOOL (*lessThanTest)(int, int);


lessThanTest = &LessThan;
lessThanTest(99, lessThan); // returns true
lessThan = 10;
lessThanTest(100, lessThan); // returns false


BOOL LessThan(int i, int lessThan) {
return i < lessThan;
}

But, now I have to pass the 2 arguments when I evaluate it. If I wished to pass this function pointer to another function where lessThan was not in scope, I would either have to manually keep it alive by passing it to each function in the chain, or by promoting it to a global.

Though most mainstream languages that support closures use anonymous functions, there is no requirement for that. You can have closures without anonymous functions, and anonymous functions without closures.

Summary: a closure is a combination of function pointer + captured variables.

In C, function pointers can be passed as arguments to functions and returned as values from functions, but functions exist only at top level: you cannot nest function definitions within each other. Think about what it would take for C to support nested functions that can access the variables of the outer function, while still being able to send function pointers up and down the call stack. (To follow this explanation, you should know the basics of how function calls are implemented in C and most similar languages: browse through the call stack entry on Wikipedia.)

What kind of object is a pointer to a nested function? It cannot just be the address of the code, because if you call it, how does it access the variables of the outer function? (Remember that because of recursion, there may be several different calls of the outer function active at one time.) This is called the funarg problem, and there are two subproblems: the downward funargs problem and the upwards funargs problem.

The downwards funargs problem, i.e., sending a function pointer "down the stack" as an argument to a function you call, is actually not incompatible with C, and GCC supports nested functions as downward funargs. In GCC, when you create a pointer to a nested function, you really get a pointer to a trampoline, a dynamically constructed piece of code that sets up the static link pointer and then calls the real function, which uses the static link pointer to access the variables of the outer function.

The upwards funargs problem is more difficult. GCC does not prevent you from letting a trampoline pointer exist after the outer function is no longer active (has no record on the call stack), and then the static link pointer could point to garbage. Activation records can no longer be allocated on a stack. The usual solution is to allocate them on the heap, and let a function object representing a nested function just point to the activation record of the outer function. Such an object is called a closure. Then the language will typically have to support garbage collection so that the records can be freed once there are no more pointers pointing to them.

Lambdas (anonymous functions) are really a separate issue, but usually a language that lets you define anonymous functions on the fly will also let you return them as function values, so they end up being closures.

Most of the responses indicate that closures require function pointers, possibly to anonymous functions, but as Mark wrote closures can exist with named functions. Here's an example in Perl:

{
my $count;
sub increment { return $count++ }
}

The closure is the environment that defines the $count variable. It is only available to the increment subroutine and persists between calls.

As someone who has written compilers for languages both with and without 'real' closures, I respectfully disagree with some of the answers above. A Lisp, Scheme, ML, or Haskell closure does not create a new function dynamically. Instead it reuses an existing function but does so with new free variables. The collection of free variables is often called the environment, at least by programming-language theorists.

A closure is just an aggregate containing a function and an environment. In the Standard ML of New Jersey compiler, we represented one as a record; one field contained a pointer to the code, and the other fields contained the values of the free variables. The compiler created a new closure (not function) dynamically by allocating a new record containing a pointer to the same code, but with different values for the free variables.

You can simulate all this in C, but it is a pain in the ass. Two techniques are popular:

  1. Pass a pointer to the function (the code) and a separate pointer to the free variables, so that the closure is split across two C variables.

  2. Pass a pointer to a struct, where the struct contains the values of the free variables and also a pointer to the code.

Technique #1 is ideal when you are trying to simulate some kind of polymorphism in C and you don't want to reveal the type of the environment---you use a void* pointer to represent the environment. For examples, look at Dave Hanson's C Interfaces and Implementations. Technique #2, which more closely resembles what happens in native-code compilers for functional languages, also resembles another familiar technique... C++ objects with virtual member functions. The implementations are almost identical.

This observation led to a wisecrack from Henry Baker:

People in the Algol/Fortran world complained for years that they didn't understand what possible use function closures would have in efficient programming of the future. Then the `object oriented programming' revolution happened, and now everyone programs using function closures, except that they still refuse to to call them that.

Closures imply some variable from the point of function definition is bound together with the function logic, like being able to declare a mini-object on the fly.

One important problem with C and closures is variables allocated on the stack will be destroyed on leaving the current scope, regardless of if a closure was pointing to them. This would lead to the kind of bugs people get when they carelessly return pointers to local variables. Closures basically imply all relevant variables are either ref-counted or garbage-collected items on a heap.

I'm not comfortable equating lambda with closure because I'm not sure that lambdas in all languages are closures, at times I think lambdas have just been locally defined anonymous functions without the binding of variables (Python pre 2.1?).

In GCC it is possible to simulate lambda functions using the following macro:

#define lambda(l_ret_type, l_arguments, l_body)       \
({                                                    \
l_ret_type l_anonymous_functions_name l_arguments \
l_body                                            \
&l_anonymous_functions_name;                      \
})

Example from source:

qsort (array, sizeof (array) / sizeof (array[0]), sizeof (array[0]),
lambda (int, (const void *a, const void *b),
{
dump ();
printf ("Comparison %d: %d and %d\n",
++ comparison, *(const int *) a, *(const int *) b);
return *(const int *) a - *(const int *) b;
}));

Using this technique of course removes the possibility of your application working with other compilers and is apparently "undefined" behavior so YMMV.

The closure captures the free variables in an environment. The environment will still exist, even though the surrounding code may no longer be active.

An example in Common Lisp, where MAKE-ADDER returns a new closure.

CL-USER 53 > (defun make-adder (start delta) (lambda () (incf start delta)))
MAKE-ADDER


CL-USER 54 > (compile *)
MAKE-ADDER
NIL
NIL

Using the above function:

CL-USER 55 > (let ((adder1 (make-adder 0 10))
(adder2 (make-adder 17 20)))
(print (funcall adder1))
(print (funcall adder1))
(print (funcall adder1))
(print (funcall adder1))
(print (funcall adder2))
(print (funcall adder2))
(print (funcall adder2))
(print (funcall adder1))
(print (funcall adder1))
(describe adder1)
(describe adder2)
(values))


10
20
30
40
37
57
77
50
60
#<Closure 1 subfunction of MAKE-ADDER 4060001ED4> is a CLOSURE
Function         #<Function 1 subfunction of MAKE-ADDER 4060001CAC>
Environment      #(60 10)
#<Closure 1 subfunction of MAKE-ADDER 4060001EFC> is a CLOSURE
Function         #<Function 1 subfunction of MAKE-ADDER 4060001CAC>
Environment      #(77 20)

Note that the DESCRIBE function shows that the function objects for both closures are the same, but the environment is different.

Common Lisp makes both closures and pure function objects (those without an environment) both to be functions and one can call both in the same way, here using FUNCALL.

The big question is: what is a closure and/or lambda behind the scenes? what does it become a closure and/or a lambda or even more, a delegate? the very possible answer is that these become in a sort of function pointer as assembly code, thus, closures, delegates, lambdas, anonymous functions are in essence a kind of sugar syntax to declare/define function pointers in a high level way but this declarations includes some mechanism to define some scope to environment variables.