javaeclipseeclipse-jdtjavaccjavaparser

Understanding JavaParser compared to JavaCC and Eclipse JDT


I'm currently beginning an automated software analysis project of which I am the research phase. I'm quite new to parsing and struggling to find info on resources regarding comparisons between the main java parsing options. I understand JavaParser was created using JavaCC, what functionalities does it contain that JavaCC does not? Are there any primary differences I should be aware of when making a decision as to which parser to use. Similarly, are there features that the Eclipse JDT contains compared to these two which may be of use to me? Thank you for any answers in advance.


Solution

  • That's by no means an exhaustive answer, just a bit of clarification on the specific part of your questions and my 5 cents on the more general one. I assume, that you want to analyze Java code.

    I also assume that it is sort of exercise in using code-as-data and grammars/parsers. Otherwise the field of code analysis itself is huge with very specific niches like finding bugs or checking code for thread safety, say.

    In general, there's a huge amount of tools available for the purpose, but if we limit them to those written in Java the biggest fish in the open source space seem to be covered here. For a more complete list see this blog from some of the authors of JavaParser and this for a general introduction to the topic. It may also be worth it to have a look at their material on the somewhat overlapping topic of language development in general.

    In an ex post view those question were lurking in the background of this response:

    Let's go from the specific to the general:

    com.github.javaparser parses a static piece of java code (note: only java, only static) and gives you an AST. The package also has SymbolResolver, which tries to determine the Java type of symbols. Its called JavaParser, but it isn't just a parser, it supports Java streams for querying and comes with AST manipulation and code generation capabilities. A main backer is an Italian company btw.

    Eclipse jdt is comparably huge, with org.eclipse.jdt.core.dom.ASTParser giving you an AST. But as opposed to JavaParser everything is geared towards handling Java (only) in an interactive development situation. Since Eclipse can perform refactorings, it must be able to analyze and manipulate the AST, here's an example for that (as part of this post) and here are comprehensive examples for the refactoring api. If you're building some Eclipse integrated functionality to support writing of code, that will be your first option anyway. Eclipse Jdt supports incremental compilation in some form which you need if you want some compile-on-the-fly-and-give-feedback-as-the-code-gets-typed functionality.

    I also worked a bit with the spoon library (developed by a university in France) which has the same focus as JavaParser, also does symbol resolution but has different querying mechanisms. It builds on org.eclipse.jdt.core. Each of those tools will give you a different AST for the same java code reflecting their intended use case, spoon describes it like this:

    A programming language can have different meta models. An abstract syntax tree (AST) or model, is an instance of a meta model. Each meta model – and consequently each AST – is more or less appropriate depending on the task at hand. For instance, the Java meta model of Sun’s compiler (javac) has been designed and optimized for compilation to bytecode, while, the main purpose of the Java meta model of the Eclipse IDE (JDT) is to support different tasks of software development in an integrated manner (code completion, quick fix of compilation errors, debug, etc.).

    The most stark difference is between the more domain specific tools and the parser generators' generated parsers. While having some difference even between them, JavaParser/Spoon ASTs mirror the code on a conceptual level, you get methods, parameter lists, parameters and so on while the generated parsers give you every detail in the grammar down to semicolons, commas and braces as elements in the AST. I think, Eclipse has an Ast View where you can see JDT's parser output perhaps, but I'm not aware of a comprehensive tool that can show you differences between different parser for java like AstExplorer does it in the javascript world.

    Which framework suits your need will depend very much on your use case. E.g. if you need symbol resolution, you're probably bound to those options that provide it anyway. I tried to get my feet wet with a Java transpiler and found the JavaParser metamodel more suitable than spoon's model and liked its small number of dependencies.

    A general (though non-incremental) way to get a handle at an AST would be a parser generator like JavaCC (read: compiler compiler (aka compiler generator) written in Java that can create parsers for anything you have a grammar for) or ANTLR. If you want to parse SQL, you feed them a sql grammar, if you want to parse Java code, you feed them this one (ANTLR-format) or this one (JavaCC-format). The result will be a parser which can give you an AST for a given piece of code and a visitor class perhaps.

    This approach gives you all possible control over the processing and the possibility to define or tweak a grammar depending on your needs, e.g. to introduce additional non-terminal nodes, trim it down to class/method-level only or pick out comments only without confusing them with string constants, if that's all you care about. You could also get at the structure of embedded non-Java code fragments, e.g. SQL query strings.

    Btw. ANTLR can handle direct left recursion in the grammar, while JavaCC can't, e.g. for arithmetic expressions for binary operators like in exp := exp + exp

    If your goal is to support developer activities as they write the code you'll have to deal with broken or incomplete code. Eclipse is build for the purpose and while I didn't use its jdt I'd expect it to handle such cases gracefully with reasonable feedback. Also ANTLR will recover from syntax errors if possible allowing you to define some error handling. I don't remember what spoon and JavaParser did in case of errors, I think, they expect syntactically correct code upfront.