如何用 ANTLR4创建 AST?

关于这个我已经搜索了很多,但是我找不到任何有用的东西来帮助我构建 AST。我已经知道 ANTLR4不像 ANTLR3那样构建 AST。每个人都说: “嘿,利用访客!”,但我找不到任何例子或更详细的解释,我如何能做到这一点..。

我有一个语法必须喜欢 C,但与每个命令写在葡萄牙语(葡萄牙语编程语言)。我可以使用 ANTLR4轻松地生成解析树。我的问题是: 我现在需要做什么来创建 AST?

顺便说一下,我用 Java 和 IntelliJ..。

编辑1: 我能得到的最接近的答案是使用这个主题的答案: < a href = “//stackoverflow. com/questions/21534316/Is-there-a-simple-example-of-using-antlr4-to-create-an-AST-from-java-source-code? lq = 1”> 有没有一个使用 antlr4从 java 源代码创建 AST 并提取方法、变量和注释的简单例子? 但它只打印访问方法的名称. 。

Since the first attempt didn't work for me as I expected, I tried to use 本教程 from ANTLR3, but I couldn't figure out how to use StringTamplate instead of ST...

最终的 ANTLR 4参考文献这本书,我也找不到任何与 AST 有关的东西。

编辑2: 现在我有一个类来创建 DOT 文件,我只需要弄清楚如何正确使用访问者

50245 次浏览

Ok, let's build a simple math example. Building an AST is totally overkill for such a task but it's a nice way to show the principle.

I'll do it in C# but the Java version would be very similar.

The grammar

First, let's write a very basic math grammar to work with:

grammar Math;


compileUnit
:   expr EOF
;


expr
:   '(' expr ')'                         # parensExpr
|   op=('+'|'-') expr                    # unaryExpr
|   left=expr op=('*'|'/') right=expr    # infixExpr
|   left=expr op=('+'|'-') right=expr    # infixExpr
|   func=ID '(' expr ')'                 # funcExpr
|   value=NUM                            # numberExpr
;


OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';


NUM :   [0-9]+ ('.' [0-9]+)? ([eE] [+-]? [0-9]+)?;
ID  :   [a-zA-Z]+;
WS  :   [ \t\r\n] -> channel(HIDDEN);

Pretty basic stuff, we have a single expr rule that handles everything (precedence rules etc).

The AST nodes

Then, let's define some AST nodes we'll use. These are totally custom and you can define them in the way you want to.

Here are the nodes we'll be using for this example:

internal abstract class ExpressionNode
{
}


internal abstract class InfixExpressionNode : ExpressionNode
{
public ExpressionNode Left { get; set; }
public ExpressionNode Right { get; set; }
}


internal class AdditionNode : InfixExpressionNode
{
}


internal class SubtractionNode : InfixExpressionNode
{
}


internal class MultiplicationNode : InfixExpressionNode
{
}


internal class DivisionNode : InfixExpressionNode
{
}


internal class NegateNode : ExpressionNode
{
public ExpressionNode InnerNode { get; set; }
}


internal class FunctionNode : ExpressionNode
{
public Func<double, double> Function { get; set; }
public ExpressionNode Argument { get; set; }
}


internal class NumberNode : ExpressionNode
{
public double Value { get; set; }
}

Converting a CST to an AST

ANTLR generated the CST nodes for us (the MathParser.*Context classes). We now have to convert these to AST nodes.

This is easily done with a visitor, and ANTLR provides us with a MathBaseVisitor<T> class, so let's work with that.

internal class BuildAstVisitor : MathBaseVisitor<ExpressionNode>
{
public override ExpressionNode VisitCompileUnit(MathParser.CompileUnitContext context)
{
return Visit(context.expr());
}


public override ExpressionNode VisitNumberExpr(MathParser.NumberExprContext context)
{
return new NumberNode
{
Value = double.Parse(context.value.Text, NumberStyles.AllowDecimalPoint | NumberStyles.AllowExponent)
};
}


public override ExpressionNode VisitParensExpr(MathParser.ParensExprContext context)
{
return Visit(context.expr());
}


public override ExpressionNode VisitInfixExpr(MathParser.InfixExprContext context)
{
InfixExpressionNode node;


switch (context.op.Type)
{
case MathLexer.OP_ADD:
node = new AdditionNode();
break;


case MathLexer.OP_SUB:
node = new SubtractionNode();
break;


case MathLexer.OP_MUL:
node = new MultiplicationNode();
break;


case MathLexer.OP_DIV:
node = new DivisionNode();
break;


default:
throw new NotSupportedException();
}


node.Left = Visit(context.left);
node.Right = Visit(context.right);


return node;
}


public override ExpressionNode VisitUnaryExpr(MathParser.UnaryExprContext context)
{
switch (context.op.Type)
{
case MathLexer.OP_ADD:
return Visit(context.expr());


case MathLexer.OP_SUB:
return new NegateNode
{
InnerNode = Visit(context.expr())
};


default:
throw new NotSupportedException();
}
}


public override ExpressionNode VisitFuncExpr(MathParser.FuncExprContext context)
{
var functionName = context.func.Text;


var func = typeof(Math)
.GetMethods(BindingFlags.Public | BindingFlags.Static)
.Where(m => m.ReturnType == typeof(double))
.Where(m => m.GetParameters().Select(p => p.ParameterType).SequenceEqual(new[] { typeof(double) }))
.FirstOrDefault(m => m.Name.Equals(functionName, StringComparison.OrdinalIgnoreCase));


if (func == null)
throw new NotSupportedException(string.Format("Function {0} is not supported", functionName));


return new FunctionNode
{
Function = (Func<double, double>)func.CreateDelegate(typeof(Func<double, double>)),
Argument = Visit(context.expr())
};
}
}

As you can see, it's just a matter of creating an AST node out of a CST node by using a visitor. The code should be pretty self-explanatory (well, maybe except for the VisitFuncExpr stuff, but it's just a quick way to wire up a delegate to a suitable method of the System.Math class).

And here you have the AST building stuff. That's all that's needed. Just extract the relevant information from the CST and keep it in the AST.

The AST visitor

Now, let's play a bit with the AST. We'll have to build an AST visitor base class to traverse it. Let's just do something similar to the AbstractParseTreeVisitor<T> provided by ANTLR.

internal abstract class AstVisitor<T>
{
public abstract T Visit(AdditionNode node);
public abstract T Visit(SubtractionNode node);
public abstract T Visit(MultiplicationNode node);
public abstract T Visit(DivisionNode node);
public abstract T Visit(NegateNode node);
public abstract T Visit(FunctionNode node);
public abstract T Visit(NumberNode node);


public T Visit(ExpressionNode node)
{
return Visit((dynamic)node);
}
}

Here, I took advantage of C#'s dynamic keyword to perform a double-dispatch in one line of code. In Java, you'll have to do the wiring yourself with a sequence of if statements like these:

if (node is AdditionNode) {
return Visit((AdditionNode)node);
} else if (node is SubtractionNode) {
return Visit((SubtractionNode)node);
} else if ...

But I just went for the shortcut for this example.

Work with the AST

So, what can we do with a math expression tree? Evaluate it, of course! Let's implement an expression evaluator:

internal class EvaluateExpressionVisitor : AstVisitor<double>
{
public override double Visit(AdditionNode node)
{
return Visit(node.Left) + Visit(node.Right);
}


public override double Visit(SubtractionNode node)
{
return Visit(node.Left) - Visit(node.Right);
}


public override double Visit(MultiplicationNode node)
{
return Visit(node.Left) * Visit(node.Right);
}


public override double Visit(DivisionNode node)
{
return Visit(node.Left) / Visit(node.Right);
}


public override double Visit(NegateNode node)
{
return -Visit(node.InnerNode);
}


public override double Visit(FunctionNode node)
{
return node.Function(Visit(node.Argument));
}


public override double Visit(NumberNode node)
{
return node.Value;
}
}

Pretty simple once we have an AST, isn't it?

Putting it all together

Last but not least, we have to actually write the main program:

internal class Program
{
private static void Main()
{
while (true)
{
Console.Write("> ");
var exprText = Console.ReadLine();


if (string.IsNullOrWhiteSpace(exprText))
break;


var inputStream = new AntlrInputStream(new StringReader(exprText));
var lexer = new MathLexer(inputStream);
var tokenStream = new CommonTokenStream(lexer);
var parser = new MathParser(tokenStream);


try
{
var cst = parser.compileUnit();
var ast = new BuildAstVisitor().VisitCompileUnit(cst);
var value = new EvaluateExpressionVisitor().Visit(ast);


Console.WriteLine("= {0}", value);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}


Console.WriteLine();
}
}
}

And now we can finally play with it:

enter image description here

I have created a small Java project that allows you to test your ANTLR grammar instantly by compiling the lexer and parser generated by ANTLR in-memory. You can just parse a string by passing it to the parser, and it will automatically generate an AST from it which can then be used in your application.

For the purpose of reducing the size of the AST, you could use a NodeFilter to which you could add the production-rule names of the non-terminals that you would like to be considered when constructing the AST.

The code and some code examples can be found at https://github.com/julianthome/inmemantlr

Hope the tool is useful ;-)

I have found two simple ways, focused on the functionality available in the TestRig.java file of antlr4.

  1. Via terminal

This is my example for parsing C++ with the corresponding CPP14.g4 grammar file java -cp .:antlr-4.9-complete.jar org.antlr.v4.gui.TestRig CPP14 translationunit -tree filename.cpp. If you omit the filename.cpp, the rig reads from stdin. "translationunit" is the start rule name of the CPP14.g4 grammar file I use.

  1. Via Java

I used parts of code found in the TestRig.java file. Let's suppose again that we have a string of C++ source code from which we want to produce the AST (you can also read directly from a file).

String source_code = "...your cpp source code...";


CodePointCharStream stream_from_string = CharStreams.fromString(source_code);
CPP14Lexer lexer = new CPP14Lexer(new ANTLRInputStream(source_code));
CommonTokenStream tokens = new CommonTokenStream(lexer);
CPP14Parser parser = new CPP14Parser(tokens);


String parserName = "CPP14Parser";
ClassLoader cl = Thread.currentThread().getContextClassLoader();
Class<? extends Parser> parserClass = null;
parserClass = cl.loadClass(parserName).asSubclass(Parser.class);


String startRuleName = "translationunit"; //as specified in my CPP14.g4 file
Method startRule = parserClass.getMethod(startRuleName);
ParserRuleContext tree = (ParserRuleContext)startRule.invoke(parser, (Object[])null);
System.out.println(tree.toStringTree(parser));

My imports are:

import java.lang.reflect.Method;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CodePointCharStream;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.Parser;

All these require that you have produced the necessary files (lexer, parser etc) with the command java -jar yournaltrfile.jar yourgrammar.g4 and that you have then compiled all the *.java files.