实现类型推理实现类型推理

我在这里看到了一些关于静态类型和动态类型的有趣讨论。 我通常更喜欢静态类型,因为编译类型检查,更好的文档化代码,等等。但是,我确实同意,如果像 Java 那样做,它们确实会使代码变得混乱,例如。

因此,我即将开始构建自己的函数式样式语言,类型推理是我想要实现的事情之一。我明白这是一个大课题,我不是要创造一些以前没有做过的东西,只是基本的推断..。

有什么可以帮到我的建议吗?最好是更务实/实用的东西,而不是更多的理论范畴理论/类型理论文本。如果有一个带有数据结构/算法的实现讨论文本,那就太好了。

12681 次浏览

Lambda the Ultimate, starting here.

Types and Programming Languages by Benjamin C. Pierce

I found the following resources helpful for understanding type inference, in order of increasing difficulty:

  1. Chapter 30 (Type Inference) of the freely available book PLAI, Programming Languages: Application and Interpretation, sketches unification-based type inference.
  2. The summer course Interpreting types as abstract values presents elegant evaluators, type checkers, type reconstructors and inferencers using Haskell as a metalanguage.
  3. Chapter 7 (Types) of the book EOPL, Essentials of Programming Languages.
  4. Chapter 22 (Type Reconstruction) of the book TAPL, Types and Programming Languages, and the corresponding OCaml implementations recon and fullrecon.
  5. Chapter 13 (Type Reconstruction) of the new book DCPL, Design Concepts in Programming Languages.
  6. Selection of academic papers.
  7. Closure compiler's TypeInference is an example of the data-flow analysis approach to type inference, which is better suited to dynamic languages that the Hindler Milner approach.

However, since the best way to learn is to do, I strongly suggest implementing type inference for a toy functional language by working through a homework assignment of a programming languages course.

I recommend these two accessible homeworks in ML, which you can both complete in less than a day:

  1. PCF Interpreter (a solution) to warm up.
  2. PCF Type Inference (a solution) to implement algorithm W for Hindley-Milner type inference.

These assignments are from a more advanced course:

  1. Implementing MiniML

  2. Polymorphic, Existential, Recursive Types (PDF)

  3. Bi-Directional Typechecking (PDF)

  4. Subtyping and Objects (PDF)

It's unfortunate that much of the literature on the subject is very dense. I too was in your shoes. I got my first introduction to the subject from Programming Languages: Applications and Interpretation

http://www.plai.org/

I'll try to summarize the abstract idea followed by details that I did not find immediately obvious. First, type inference can be thought of generating and then solving constraints. To generate constraints, you recurse through the syntax tree and generate one or more constraints on each node. For example, if the node is a + operator, the operands and the results must all be numbers. A node that applies a function has the same type as the result of the function, and so on.

For a language without let, you can blindly solve the above constraints by substitution. For example:

(if (= 1 2)
1
2)

here, we can say that the condition of the if statement must be Boolean, and that the type of the if statement is the same as the type of its then and else clauses. Since we know 1 and 2 are numbers, by substitution, we know the if statement is a number.

Where things get nasty, and what I couldn't understand for a while, is dealing with let:

(let ((id (lambda (x) x)))
(id id))

Here, we've bound id to a function that returns whatever you've passed in, otherwise known as the identity function. The problem is the type of the function's parameter x is different on each usage of id. The second id is a function of type a -> a, where a can be anything. The first is of type (a -> a) -> (a -> a). This is known as let-polymorphism. The key is to solve constraints in a particular order: first solve constraints for the definition of id. This will be a -> a. Then fresh, separate copies of the type of id can be substituted into the constraints for each place id is used, for example x1 and x2.

That wasn't readily explained in online resources. They'll mention algorithm W or M but not how they work in terms of solving constraints, or why it doesn't barf on let-polymorphism: each of those algorithms enforce an ordering on solving the constraints.

I found this resource extremely helpful to tie Algorithm W, M, and the general concept of constraint generation and solving all together. It's a little dense, but better than many:

http://www.cs.uu.nl/research/techreps/repo/CS-2002/2002-031.pdf

Many of the other papers there are nice too:

http://people.cs.uu.nl/bastiaan/papers.html

I hope that helps clarify a somewhat murky world.

In addition to Hindley Milner for functional languages, another popular approach to type inference for dynamic language is abstract interpretation.

The idea of abstract interpretation is to write a special interpreter for the language, instead of keep an environment of concrete values(1, false, closure), it works on abstract values, aka types(int, bool, etc). Since it's interpreting the program on abstract values, that's why it's called abstract interpretation.

Pysonar2 is an elegant implementation of abstract interpretation for Python. It is used by Google to analyze Python projects. Basically it uses visitor pattern to dispatch evaluation call to relevant AST node. The visitor function transform accepts the context in which current node will be evaluated, and returns the type of current node.