Does Haskell require a garbage collector?

I'm curious as to why Haskell implementations use a GC.

I can't think of a case where GC would be necessary in a pure language. Is it just an optimization to reduce copying, or is it actually necessary?

I'm looking for example code that would leak if a GC wasn't present.

23114 次浏览

A garbage collector is never necessary, provided you have sufficient memory. However, in reality, we don't have infinite memory, and so we need some method to reclaim memory that is no longer needed. In impure languages like C, you can explicitly state you're done with some memory to free it - but this is a mutating operation (the memory you just freed is no longer safe to read), so you can't use this approach in a pure language. So it's either somehow statically analyze where you can free the memory (probably impossible in the general case), leak memory like a sieve (works great until you run out), or use a GC.

The standard implementation techniques applied to Haskell actually require a GC moreso than most other languages, since they never mutate previous values, instead creating new, modified values based on the previous ones. Since this means the program is constantly allocating and using more memory, a large number of the values will be discarded as time goes on.

This is why GHC programs tend to have such high total allocation figures (from gigabytes to terabytes): they're constantly allocating memory, and it's only thanks to the efficient GC that they reclaim it before running out.

If a language (any language) allows you to allocate objects dynamically, then there are three practical ways to deal with the management of memory:

  1. The language can only allow you to allocate memory on the stack, or at startup. But these restrictions severely limit the kinds of computations that a program can perform. (In practice. In theory, you can emulate dynamic data structures in (say) Fortran by representing them in a big array. It is HORRIBLE ... and not relevant to this discussion.)

  2. The language can provide an explicit free or dispose mechanism. But this relies on the programmer to get it right. Any mistake in the storage management can result in a memory leak ... or worse.

  3. The language (or more strictly, the language implementation) can provide an automatic storage manager for the dynamically allocated storage; i.e. some form of garbage collector.

The only other option is to never reclaim dynamically allocated storage. This is not a practical solution, except for small programs performing small computations.

Applying this to Haskell, the language doesn't have the limitation of 1., and there is no manual deallocation operation as per 2. Therefore, in order to be useable for non-trivial things, a Haskell implementation needs to include a garbage collector.

I can't think of a case where GC would be necessary in a pure language.

Presumably you mean a pure functional language.

The answer is that a GC is required under the hood to reclaim the heap objects that the language MUST create. For example.

  • A pure function needs to create heap objects because in some cases it has to return them. That means that they can't be allocated on the stack.

  • The fact that there can be cycles (resulting from a let rec for example) means that a reference counting approach won't work for heap objects.

  • Then there are function closures ... which also can't be allocated on the stack because they have a lifetime that is (typically) independent of that stack frame in which they were created.

I'm looking for example code that would leak if a GC wasn't present.

Just about any example that involved closures or graph-shaped data structures would leak under those conditions.

Extensive work was done in the 1990s and early 2000s on region inference for the strict functional language ML. Mads Tofte, Lars Birkedal, Martin Elsman, Niels Hallenberg have written a quite readable retrospective on their work on region inference, much of which they integrated into the MLKit compiler. They experimented with purely region-based memory management (i.e. no garbage collector) as well as hybrid region-based/garbage-collected memory management, and reported that their test programs ran "between 10 times faster and 4 times slower" than pure garbage-collected versions.

Let's take a trivial example. Given this

f (x, y)

You may also indicate the path to the gemfile in the same command e.g.

BUNDLE_GEMFILE="MyProject/Gemfile.ios" bundle install

you need to allocate the pair (x, y) somewhere before calling f. When can you deallocate that pair? You have no idea. It cannot be deallocated when f returns, because f might have put the pair in a data structure (e.g, f p = [p]), so the lifetime of the pair might have to be longer than return from f. Now, say that the pair was put in a list, can whoever takes the list apart deallocate the pair? No, because the pair might be shared (e.g., let p = (x, y) in (f p, p)). So it's really difficult to tell when the pair can be deallocated.

ing f. When can you deallocate that pair? You have no idea. It cannot be deallocated when f returns, because f might have put the pair in a data structure (e.g, f p = [p]), so the lifetime of the pair might have to be longer than return from f. Now, say that the pair was put in a list, can whoever takes the list apart deallocate the pair? No, because the pair might be shared (e.g., let p = (x, y) in (f p, p)). So it's really difficult to tell when the pair can be deallocated.

The same holds for almost all allocations in Haskell. That said, it's possible have an analysis (region analysis) that gives an upper bound on the lifetime. This works reasonably well in strict languages, but less so in lazy languages (lazy languages tend to do a lot more mutation than strict languages in the implementation).

The same holds for almost all allocations in Haskell. That said, it's possible have an analysis (region analysis) that gives an upper bound on the lifetime. This works reasonably well in strict languages, but less so in lazy languages (lazy languages tend to do a lot more mutation than strict languages in the implementation).

So I'd like to turn the question around. Why do you think Haskell does not need GC. How would you suggest memory allocation to be done?

So, if you implement your language lazily using thunks, you have deferred all reasoning about object lifetimes until the last moment, which is runtime. Since you now know nothing about lifetimes, the only thing you can reasonably do is garbage collect...

But in general, purely linear programming is too difficult to be useful, so we settle for GC.

GC is "must have" in pure FP languages. Why? Operations alloc and free are impure! And the second reason is, that immutable recursive data structures needs GC for existence because backlinking creates abstruse and unmaintainable structures for human mind. Of course, backlinking is blessing, because copying of structures which uses it is very cheap.

I had this problem as well on an OSX machine. I discovered that rails was not installed... which surprised me as I thought OSX always came with Rails.

Anyway, If you don't believe me, just try to implement FP language and you will see that I'am right.

To install rails

    EDIT: I forgot. Laziness is HELL without GC. Don't believe me? Just try it without GC in, for example, C++. You will see ... things