在哈斯克尔写一个 Haskell 翻译

一个经典的编程练习是用 Lisp/Scheme 编写一个 Lisp/Scheme 解释器。可以利用完整语言的强大功能为语言的一个子集生成解释器。

Haskell 有类似的练习吗?我想使用 Haskell 作为引擎来实现 Haskell 的一个子集。当然,它 可以做,但有没有任何在线资源可查看?


背景故事是这样的。

我正在探索使用 Haskell 作为一种语言的想法,以探索我正在教授的 离散结构课程中的一些概念。这个学期我选定了 米兰达,一种启发了 Haskell 的小型语言。米兰达做了我想做的90% 但哈斯克尔做了2000% 。:)

因此,我的想法是创建一种语言,它具有 Haskell 的特性,我喜欢这些特性,并且不允许其他任何特性。随着学生的进步,一旦他们掌握了基础知识,我就可以有选择地“打开”各种功能。

教学“语言水平”已被成功地用于教学 爪哇咖啡计划。通过限制他们的能力,你可以防止他们在掌握你要教的语法和概念时搬起石头砸自己的脚。您还可以提供更好的错误消息。

15920 次浏览

You might look at Happy (a yacc-like parser in Haskell) which has a Haskell parser.

Do you want to build your interpreter from scratch? Begin with implementing an easier functional language like the lambda calculus or a lisp variant. For the latter there is a quite nice wikibook called Write yourself a Scheme in 48 hours giving a cool and pragmatic introduction into parsing and interpretation techniques.

Interpreting Haskell by hand will be much more complex since you'll have to deal with highly complex features like typeclasses, an extremely powerful type system (type-inference!) and lazy-evaluation (reduction techniques).

So you should define a quite little subset of Haskell to work with and then maybe start by extending the Scheme-example step by step.

Addition:

Note that in Haskell, you have full access to the interpreters API (at least under GHC) including parsers, compilers and of course interpreters.

The package to use is hint (Language.Haskell.*). I have unfortunately neither found online tutorials on this nor tried it out by myself but it looks quite promising.

create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.

I suggest a simpler (as in less work involved) solution to this problem. Instead of creating a Haskell implementation where you can turn features off, wrap a Haskell compiler with a program that first checks that the code doesn't use any feature you disallow, and then uses the ready-made compiler to compile it.

That would be similar to HLint (and also kind of its opposite):

HLint (formerly Dr. Haskell) reads Haskell programs and suggests changes that hopefully make them easier to read. HLint also makes it easy to disable unwanted suggestions, and to add your own custom suggestions.

  • Implement your own HLint "suggestions" to not use the features you don't allow
  • Disable all the standard HLint suggestions.
  • Make your wrapper run your modified HLint as a first step
  • Treat HLint suggestions as errors. That is, if HLint "complained" then the program doesn't proceed to compilation stage

see if helium would make a better base to build upon than standard haskell.

Don't you think it would be easier to take the GHC sources and strip out what you don't want, than it would be to write your own Haskell interpreter from scratch? Generally speaking, there should be a lot less effort involved in removing features as opposed to creating/adding features.

GHC is written in Haskell anyway, so technically that stays with your question of a Haskell interpreter written in Haskell.

It probably wouldn't be too hard to make the whole thing statically linked and then only distribute your customized GHCi, so that the students can't load other Haskell source modules. As to how much work it would take to prevent them from loading other Haskell object files, I have no idea. You might want to disable FFI too, if you have a bunch of cheaters in your classes :)

I love your goal, but it's a big job. A couple of hints:

  • I've worked on GHC, and you don't want any part of the sources. Hugs is a much simpler, cleaner implementation but unfortunately it's in C.

  • It's a small piece of the puzzle, but Mark Jones wrote a beautiful paper called Typing Haskell in Haskell which would be a great starting point for your front end.

Good luck! Identifying language levels for Haskell, with supporting evidence from the classroom, would be of great benefit to the community and definitely a publishable result!

Baskell is a teaching implementation, http://hackage.haskell.org/package/baskell

You might start by picking just, say, the type system to implement. That's about as complicated as an interpreter for Scheme, http://hackage.haskell.org/package/thih

Uhc/Ehc is a series of compilers enabling/disabling various Haskell features. http://www.cs.uu.nl/wiki/Ehc/WebHome#What_is_UHC_And_EHC

This might be a good idea - make a tiny version of NetLogo in Haskell. Here is the tiny interpreter.

The EHC series of compilers is probably the best bet: it's actively developed and seems to be exactly what you want - a series of small lambda calculi compilers/interpreters culminating in Haskell '98.

But you could also look at the various languages developed in Pierce's Types and Programming Languages, or the Helium interpreter (a crippled Haskell intended for students http://en.wikipedia.org/wiki/Helium_(Haskell)).

There is a complete Haskell parser: http://hackage.haskell.org/package/haskell-src-exts

Once you've parsed it, stripping out or disallowing certain things is easy. I did this for tryhaskell.org to disallow import statements, to support top-level definitions, etc.

Just parse the module:

parseModule :: String -> ParseResult Module

Then you have an AST for a module:

Module SrcLoc ModuleName [ModulePragma] (Maybe WarningText) (Maybe [ExportSpec]) [ImportDecl] [Decl]

The Decl type is extensive: http://hackage.haskell.org/packages/archive/haskell-src-exts/1.9.0/doc/html/Language-Haskell-Exts-Syntax.html#t%3ADecl

All you need to do is define a white-list -- of what declarations, imports, symbols, syntax is available, then walk the AST and throw a "parse error" on anything you don't want them to be aware of yet. You can use the SrcLoc value attached to every node in the AST:

data SrcLoc = SrcLoc
{ srcFilename :: String
, srcLine :: Int
, srcColumn :: Int
}

There's no need to re-implement Haskell. If you want to provide more friendly compile errors, just parse the code, filter it, send it to the compiler, and parse the compiler output. If it's a "couldn't match expected type a against inferred a -> b" then you know it's probably too few arguments to a function.

Unless you really really want to spend time implementing Haskell from scratch or messing with the internals of Hugs, or some dumb implementation, I think you should just filter what gets passed to GHC. That way, if your students want to take their code-base and take it to the next step and write some real fully fledged Haskell code, the transition is transparent.

If you're looking for a subset of Haskell that's easy to implement, you can do away with type classes and type checking. Without type classes, you don't need type inference to evaluate Haskell code.

I wrote a self-compiling Haskell subset compiler for a Code Golf challenge. It takes Haskell subset code on input and produces C code on output. I'm sorry there isn't a more readable version available; I lifted nested definitions by hand in the process of making it self-compiling.

For a student interested in implementing an interpreter for a subset of Haskell, I would recommend starting with the following features:

  • Lazy evaluation. If the interpreter is in Haskell, you might not have to do anything for this.

  • Function definitions with pattern-matched arguments and guards. Only worry about variable, cons, nil, and _ patterns.

  • Simple expression syntax:

    • Integer literals

    • Character literals

    • [] (nil)

    • Function application (left associative)

    • Infix : (cons, right associative)

    • Parenthesis

    • Variable names

    • Function names

More concretely, write an interpreter that can run this:

-- tail :: [a] -> [a]
tail (_:xs) = xs


-- append :: [a] -> [a] -> [a]
append []     ys = ys
append (x:xs) ys = x : append xs ys


-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f (a:as) (b:bs) = f a b : zipWith f as bs
zipWith _ _      _      = []


-- showList :: (a -> String) -> [a] -> String
showList _    []     = '[' : ']' : []
showList show (x:xs) = '[' : append (show x) (showItems show xs)


-- showItems :: (a -> String) -> [a] -> String
showItems show []     = ']' : []
showItems show (x:xs) = ',' : append (show x) (showItems show xs)


-- fibs :: [Int]
fibs = 0 : 1 : zipWith add fibs (tail fibs)


-- main :: String
main = showList showInt (take 40 fibs)

Type checking is a crucial feature of Haskell. However, going from nothing to a type-checking Haskell compiler is very difficult. If you start by writing an interpreter for the above, adding type checking to it should be less daunting.

I've been told that Idris has a fairly compact parser, not sure if it's really suitable for alteration, but it's written in Haskell.

Andrej Bauer's Programming Language Zoo has a small implementation of a purely functional programming language somewhat cheekily named "minihaskell". It is about 700 lines of OCaml, so very easy to digest.

The site also contains toy versions of ML-style, Prolog-style and OO programming languages.

The reason why there are so many LISP interpreters is that LISP is basically a predecessor of JSON: a simple format to encode data. This makes the frontend part quite easy to handle. Compared to that, Haskell, especially with Language Extensions, is not the easiest language to parse. These are some syntactical constructs that sound tricky to get right:

  • operators with configurable precedence, associativity, and fixity,
  • nested comments
  • layout rule
  • pattern syntax
  • do- blocks and desugaring to monadic code

Each of these, except maybe the operators, could be tackled by students after their Compiler Construction Course, but it would take the focus away from how Haskell actually works. In addition to that, you might not want to implement all syntactical constructs of Haskell directly, but instead implement passes to get rid of them. Which brings us to the literal core of the issue, pun fully intended.

My suggestion is to implement typechecking and an interpreter for Core instead of full Haskell. Both of these tasks are quite intricate by themselves already. This language, while still a strongly typed functional language, is way less complicated to deal with in terms of optimization and code generation. However, it is still independent from the underlying machine. Therefore, GHC uses it as an intermediary language and translates most syntaxical constructs of Haskell into it.

Additionally, you should not shy away from using GHC's (or another compiler's) frontend. I'd not consider that as cheating since custom LISPs use the host LISP system's parser (at least during bootstrapping). Cleaning up Core snippets and presenting them to students, along with the original code, should allow you to give an overview of what the frontend does, and why it is preferable to not reimplement it.

Here are a few links to the documentation of Core as used in GHC: