ANTLR4中的处理错误

当解析器不知道该做什么时,默认行为是将消息打印到终端,如:

行1:23在“}”处缺少小数

这是一条好消息,但是放错了地方。我宁愿作为例外收到这条消息。

我试过使用 BailErrorStrategy,但是它抛出一个没有消息的 ParseCancellationException(由 InputMismatchException引起,也没有消息)。

有没有一种方法可以让它通过异常报告错误,同时保留有用的信息在消息?


下面是我真正想要的——我通常在规则中使用操作来构建一个对象:

dataspec returns [DataExtractor extractor]
@init {
DataExtractorBuilder builder = new DataExtractorBuilder(layout);
}
@after {
$extractor = builder.create();
}
: first=expr { builder.addAll($first.values); } (COMMA next=expr { builder.addAll($next.values); })* EOF
;


expr returns [List<ValueExtractor> values]
: a=atom { $values = Arrays.asList($a.val); }
| fields=fieldrange { $values = values($fields.fields); }
| '%' { $values = null; }
| ASTERISK { $values = values(layout); }
;

然后,当我调用解析器时,我会执行以下操作:

public static DataExtractor create(String dataspec) {
CharStream stream = new ANTLRInputStream(dataspec);
DataSpecificationLexer lexer = new DataSpecificationLexer(stream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
DataSpecificationParser parser = new DataSpecificationParser(tokens);


return parser.dataspec().extractor;
}

我真正想要的是

  • 当输入无法解析时,dataspec()调用引发异常(理想情况下是选中的异常)
  • 为该异常提供一个有用的消息,并提供对发现问题的行号和位置的访问

然后,我将让这个异常在调用堆栈中浮现,以便最适合向用户显示有用的消息——就像我处理丢失的网络连接、读取损坏的文件等一样。

我确实看到在 ANTLR4中行为被认为是“先进的”,所以也许我正以一种奇怪的方式去做事情,但是我还没有研究什么是“非先进的”方式来做这件事,因为这种方式已经很好地满足了我们的需要。

48648 次浏览

When you use the DefaultErrorStrategy or the BailErrorStrategy, the ParserRuleContext.exception field is set for any parse tree node in the resulting parse tree where an error occurred. The documentation for this field reads (for people that don't want to click an extra link):

The exception which forced this rule to return. If the rule successfully completed, this is null.

Edit: If you use DefaultErrorStrategy, the parse context exception will not be propagated all the way out to the calling code, so you'll be able to examine the exception field directly. If you use BailErrorStrategy, the ParseCancellationException thrown by it will include a RecognitionException if you call getCause().

if (pce.getCause() instanceof RecognitionException) {
RecognitionException re = (RecognitionException)pce.getCause();
ParserRuleContext context = (ParserRuleContext)re.getCtx();
}

Edit 2: Based on your other answer, it appears that you don't actually want an exception, but what you want is a different way to report the errors. In that case, you'll be more interested in the ANTLRErrorListener interface. You want to call parser.removeErrorListeners() to remove the default listener that writes to the console, and then call parser.addErrorListener(listener) for your own special listener. I often use the following listener as a starting point, as it includes the name of the source file with the messages.

public class DescriptiveErrorListener extends BaseErrorListener {
public static DescriptiveErrorListener INSTANCE = new DescriptiveErrorListener();


@Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol,
int line, int charPositionInLine,
String msg, RecognitionException e)
{
if (!REPORT_SYNTAX_ERRORS) {
return;
}


String sourceName = recognizer.getInputStream().getSourceName();
if (!sourceName.isEmpty()) {
sourceName = String.format("%s:%d:%d: ", sourceName, line, charPositionInLine);
}


System.err.println(sourceName+"line "+line+":"+charPositionInLine+" "+msg);
}
}

With this class available, you can use the following to use it.

lexer.removeErrorListeners();
lexer.addErrorListener(DescriptiveErrorListener.INSTANCE);
parser.removeErrorListeners();
parser.addErrorListener(DescriptiveErrorListener.INSTANCE);

A much more complicated example of an error listener that I use to identify ambiguities which render a grammar non-SLL is the ABC0 class in TestPerformance.

What I've come up with so far is based on extending DefaultErrorStrategy and overriding it's reportXXX methods (though it's entirely possible I'm making things more complicated than necessary):

public class ExceptionErrorStrategy extends DefaultErrorStrategy {


@Override
public void recover(Parser recognizer, RecognitionException e) {
throw e;
}


@Override
public void reportInputMismatch(Parser recognizer, InputMismatchException e) throws RecognitionException {
String msg = "mismatched input " + getTokenErrorDisplay(e.getOffendingToken());
msg += " expecting one of "+e.getExpectedTokens().toString(recognizer.getTokenNames());
RecognitionException ex = new RecognitionException(msg, recognizer, recognizer.getInputStream(), recognizer.getContext());
ex.initCause(e);
throw ex;
}


@Override
public void reportMissingToken(Parser recognizer) {
beginErrorCondition(recognizer);
Token t = recognizer.getCurrentToken();
IntervalSet expecting = getExpectedTokens(recognizer);
String msg = "missing "+expecting.toString(recognizer.getTokenNames()) + " at " + getTokenErrorDisplay(t);
throw new RecognitionException(msg, recognizer, recognizer.getInputStream(), recognizer.getContext());
}
}

This throws exceptions with useful messages, and the line and position of the problem can be gotten from either the offending token, or if that's not set, from the current token by using ((Parser) re.getRecognizer()).getCurrentToken() on the RecognitionException.

I'm fairly happy with how this is working, though having six reportX methods to override makes me think there's a better way.

Since I've had a little bit of a struggle with the two existing answers, I'd like to share the solution I ended up with.

First of all I created my own version of an ErrorListener like Sam Harwell suggested:

public class ThrowingErrorListener extends BaseErrorListener {


public static final ThrowingErrorListener INSTANCE = new ThrowingErrorListener();


@Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e)
throws ParseCancellationException {
throw new ParseCancellationException("line " + line + ":" + charPositionInLine + " " + msg);
}
}

Note the use of a ParseCancellationException instead of a RecognitionException since the DefaultErrorStrategy would catch the latter and it would never reach your own code.

Creating a whole new ErrorStrategy like Brad Mace suggested is not necessary since the DefaultErrorStrategy produces pretty good error messages by default.

I then use the custom ErrorListener in my parsing function:

public static String parse(String text) throws ParseCancellationException {
MyLexer lexer = new MyLexer(new ANTLRInputStream(text));
lexer.removeErrorListeners();
lexer.addErrorListener(ThrowingErrorListener.INSTANCE);


CommonTokenStream tokens = new CommonTokenStream(lexer);


MyParser parser = new MyParser(tokens);
parser.removeErrorListeners();
parser.addErrorListener(ThrowingErrorListener.INSTANCE);


ParserRuleContext tree = parser.expr();
MyParseRules extractor = new MyParseRules();


return extractor.visit(tree);
}

(For more information on what MyParseRules does, see here.)

This will give you the same error messages as would be printed to the console by default, only in the form of proper exceptions.

For anyone interested, here's the ANTLR4 C# equivalent of Sam Harwell's answer:

using System; using System.IO; using Antlr4.Runtime;
public class DescriptiveErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
public static DescriptiveErrorListener Instance { get; } = new DescriptiveErrorListener();
public void SyntaxError(TextWriter output, IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e) {
if (!REPORT_SYNTAX_ERRORS) return;
string sourceName = recognizer.InputStream.SourceName;
// never ""; might be "<unknown>" == IntStreamConstants.UnknownSourceName
sourceName = $"{sourceName}:{line}:{charPositionInLine}";
Console.Error.WriteLine($"{sourceName}: line {line}:{charPositionInLine} {msg}");
}
public override void SyntaxError(TextWriter output, IRecognizer recognizer, Token offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e) {
this.SyntaxError(output, recognizer, 0, line, charPositionInLine, msg, e);
}
static readonly bool REPORT_SYNTAX_ERRORS = true;
}
lexer.RemoveErrorListeners();
lexer.AddErrorListener(DescriptiveErrorListener.Instance);
parser.RemoveErrorListeners();
parser.AddErrorListener(DescriptiveErrorListener.Instance);

For people who use Python, here is the solution in Python 3 based on Mouagip's answer.

First, define a custom error listener:

from antlr4.error.ErrorListener import ErrorListener
from antlr4.error.Errors import ParseCancellationException


class ThrowingErrorListener(ErrorListener):
def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
ex = ParseCancellationException(f'line {line}: {column} {msg}')
ex.line = line
ex.column = column
raise ex

Then set this to lexer and parser:

lexer = MyScriptLexer(script)
lexer.removeErrorListeners()
lexer.addErrorListener(ThrowingErrorListener())


token_stream = CommonTokenStream(lexer)


parser = MyScriptParser(token_stream)
parser.removeErrorListeners()
parser.addErrorListener(ThrowingErrorListener())


tree = parser.script()